Pipeline configuration unit protocols and communication

ABSTRACT

An example method of controlling a data processing system having a cellular structure. The method includes transmitting a first configuration word to a first processing unit in the cellular structure. The method also includes processing data with the first processing unit in accordance with the first configuration word. The method also includes transmitting a second configuration word to the first processing unit. The method also includes transmitting a reconfiguration signal to the first unit, the reconfiguration signal indicating that the first unit should begin processing data in accordance with the second configuration word. If the first processing unit has completed processing data in accordance with the first configuration word prior to when the reconfiguration signal is received by the first processing unit, data may be processed by the first processing unit in accordance with the second configuration word. If the first processing unit has not completed processing data in accordance with the first configuration word, data may continue to be processed with the first processing unit in accordance with the first configuration word.

[0001] The present invention relates to methods which permit efficientconfiguration and reconfiguration of one or more reconfigurablesubassemblies by one or more configuration units (CT) at highfrequencies. This describes how an efficient and synchronized networkcan be created to control multiple CTs.

[0002] The term subassembly or cell includes conventional FPGA cells,bus systems, memories, peripherals and ALUs as well as arithmetic unitsof processors. Reference is made in this regard to the definitions bythe present applicant/assignee. In particular, a subassembly isunderstood to be any type of configurable and reconfigurable elements.For parallel computer systems, a subassembly can be understood as acomplete node of any function (in particular, however, arithmetic,memory and data transmission functions).

[0003] The method described here can be used in particular forintegrated modules having a plurality of subassemblies in aone-dimensional or multidimensional arrangement, interconnected directlyor through a bus system.

[0004] The generic definition of modules includes systolic arrays,neural networks, multiprocessor systems, processors having multiplearithmetic units and logic cells, as well as known modules of the typesFPGA, DPGA, XPUTER, etc.

[0005] In the following description, modules of an architecture whosearithmetic units and bus systems are freely configurable are used. Thisarchitecture has already been published in German Patent 4416881 as wellas PACT02, PACT08, PACT10, PACT13 and is referred to below as VPU. Thisarchitecture is composed of any desired arithmetic cells, logic cells(including memories) or communicative (IO) cells (PAEs), which may bearranged in a one-dimensional or multidimensional matrix (PA), and thematrix may have different cells of any design, the bus systems alsobeing understood to be cells. The matrix as a whole or parts thereof areassigned a configuration unit (CT) which influences the interconnectionsand function of the PA.

[0006] A special property of VPUs is the automatic and deadlock-freereconfiguration at run time. Protocols and methods required for this areknown from PACT04, 05, 08, 10 and 13, the full content of which isincluded here through this reference. The publication numbers for theseinternal file numbers can be found in the addendum.

[0007] 1. Initial States of PAEs and Bus Protocol of the Configuration

[0008] Each PAE is allocated states that influence configurability.Whether these states are locally coded or are managed through one ormore switch groups, in particular the CT itself, is irrelevant here. APAE has at least two states:

[0009] “Not configured”—In this state, the PAE is inactive and is notprocessing any data and/or triggers. In particular, it does not receiveany data and/or triggers, nor does it generate any data and/or triggers.Only data and/or triggers relevant to the configuration can be receivedand/or processed. The PAE is completely neutral and may be configured.However, the possibility of initializing the registers for the dataand/or triggers to be processed, in particular by the CT in this state,should also be mentioned here.

[0010] “Configured”—The function and interconnection of the PAE isconfigured. The PAE processes and generates data and/or triggers to beprocessed. Such states may also be present repeatedly, largelyindependently of one another, in independent parts of a PAE.

[0011] Inasmuch as the relevant separation here between data and/ortriggers for processing on the one hand and data and/or triggers forconfiguration of one or more cells on the other hand may be apparentfrom the context below, it is not explained explicitly in detail.

[0012] During configuration, the CT sends together with a validconfiguration word (KW) a signal indicating its validity (RDY). This maybe omitted if validity is ensured by some other means, e.g., in the caseof continuous transmission or by a code in the KW. In addition, theaddress of the PAE to be configured is generally coded in a KW.

[0013] According to the criteria described below and in the patentapplications referenced, a PAE decides whether it can accept the KW andalter its configuration or whether data processing must not beinterrupted or corrupted by a new configuration. In any case, theinformation regarding whether or not configurations are accepted isrelayed to the CT if the decision has not already been made there. Thefollowing protocol would be possible: If a PAE accepts theconfiguration, it sends an acknowledgment ACK to the CT. If theconfiguration is rejected, a PAE will indicate this by sending REJ(reject) to the CT.

[0014] Within the data processing elements (PAEs), a decision is made byone or more of these elements regarding whether it/they can bereconfigured because the data processing is concluded or whether theyare still processing data. In addition, no data is corrupted due tounconfigured PAEs.

[0015] 2. Deadlock Freedom and Correctness of the Data

[0016] 2.1. FILMO Principle

[0017] It is important here to have efficient management of a pluralityof configurations, each of which may be composed of one or more KWs andpossibly additional control commands and can be configured overlappinglyon the PA. This is due to the fact that there is often a great distancebetween the CT and the cell(s) to be configured, which is a disadvantagein the transmission of configurations. At the same time, the technologyensures that no data or states are corrupted due to a reconfiguration.To achieve this, the following rules, which are called the FILMOprinciple, are defined:

[0018] a) PAEs which are currently processing data are not reconfigured.A reconfiguration should take place only when data processing iscompletely concluded or it is certain that no further data processing isnecessary. (Explanation: Reconfiguration of PAEs, which are currentlyprocessing data or are waiting for outstanding data, leads to faultycalculation or loss of data.)

[0019] b) The status of a PAE should not change from “configured” to“not configured” during a FILMO run. In addition to the method accordingto PACT10, a special additional method which allows exceptions(explicit/implict LOCK) is described below. This is explained asfollows: If a SubConf is understood as a quantity of configuration wordsto be configured jointly into the cell array at a given time or for agiven purpose, then a situation can occur where two different SubConfs(A, D) are supposed to share the same resources, in particular a PAE X.For example, SubConf A may chronologically precede SubConf B. SubConf Amust therefore occupy the resources before SubConf D. If PAE X is still“configured” at the configuration time of SubConf A, but its statuschanges to “not configured” before the configuration of SubConf D, thena deadlock situation may occur if no special measures are taken, namelyif SubConf A can no longer configure the PAE X and SubConf D occupiesonly PAE X, for example, but the remaining resources which are alreadyoccupied by SubConf A can perform no more configuration. Neither SubConfA nor SubConf D can be executed. A deadlock would occur.

[0020] c) A SubConf should have either successfully configured orallocated all the PAEs belonging to it or it should have received areject (REJ) before the following SubConf is configured. However, thisis true only if the two configurations share the same resources entirelyor in part. If there is no resource conflict, the two SubConfs can beconfigured independently of one another. Even if PAEs reject aconfiguration (REJ) for a SubConf, then the configuration of thefollowing SubConfs is performed. Since the status of PAEs does notchange during a FILMO run (LOCK, according to section b), this ensuresthat no PAEs which would have required the preceding configuration areconfigured during the following configuration. Explanation: A deadlockwould occur if a SubConf which is to be configured later were toallocate the PAEs to a SubConf which is to configured previously,because no SubConf could be configured completely any longer.

[0021] d) Within one SubConf, it may be necessary for certain PAEs to beconfigured or started in a certain sequence. Explanation: A PAE may beswitched to a bus, for example, only after the bus has also beenconfigured for the SubConf. Switching to a different bus would lead toprocessing of false data.

[0022] e) In the case of certain algorithms, the sequence in theconfiguration of SubConf must correspond exactly to the sequence oftriggers arriving at the CT. For example, if the trigger which initiatesthe configuration of SubConf 1 arrives before the trigger whichinitiates the configuration of SubConf 3, then SubConf 1 must beconfigured completely before SubConf 3 may be configured.

[0023] If the order of triggers were reversed, this could lead to adefective sequence of subgraphs, depending on the algorithm (SubConf,see PACT13). Methods which meet most or all of the requirements listedabove are known from PACT05 and PACT10.

[0024] Management of the configurations, their timing and arrangementand the design of the respective components, in particular theconfiguration registers, etc., has proven to be fundamental for thetechnique described here, however, and possible improvements in theknown related art should therefore also be mentioned.

[0025] The object of the present invention is to provide a noveldevelopment for commercial application.

[0026] This object is achieved in the independent form claimed here.Preferred embodiments can be found in the subordinate claims.

[0027] To ensure that requirements e) are met as needed, it is proposedthat the triggers received, which pertain to the status of a SubConf anda cell and/or reconfigurability, should be stored in the correctsequence by way of a simple FIFO, especially one allocated to the CT.Each FIFO entry contains the triggers received in a clock cycle, and inparticular all the triggers received in one clock cycle can be stored.If there are no triggers, no FIFO entry is generated. The CT processesthe FIFO in the sequence of the triggers received. If one entry containsmultiple triggers, the CT will first process each trigger individually,optionally either (I) prioritized or (ii) unprioritized, beforeprocessing the next FIFO entry. Since a trigger is usually sent to theCT only once per configuration, it is usually sufficient to define themaximum depth of the FIFO relative to the quantity of all trigger lineswired to the CT. As an alternative method, a time stamp protocolaccording to PACT18 may also be used.

[0028] Two basic types of FILMO are known from PACT10:

[0029] Separate FILMO: The FILMO is designed as a separate memory and isseparated from the normal CT memory which caches the SubConf. Only KWsthat could not be configured in the PA are copied to the FILMO.

[0030] Integrated FILMO: The FILMO is integrated into the CT memory. KWsthat could not be configured are managed by using flags and pointers.

[0031] The methods according to the present invention as described belowcan either be applied to both types of FILMO or to one certain type.

[0032] 2.2. Differential Reconfiguration

[0033] With many algorithms, it is only advisable to make minimalchanges in configuration during operation on the basis of certain eventsrepresented by triggers or by time tuning without completely deletingthe configuration of the PAEs. In most cases, this applies to the wiringof the bus systems or to certain constants. For example, if only oneconstant is to be changed, it is advisable to be able write a KW to therespective PAE without the PAE being in an “unconfigured” state. Thisreduces the amount of configuration data to be transferred and istherefore advantageous. This can be achieved with a “differentialreconfiguration” configuration mode, where the KW contains theinformation DIFFERENTIAL either in encoded form or explicitly in writingthe KW. DIFFERENTIAL indicates that the KW is to be sent to a PAE thathas already been configured. The acceptance of the differentialconfiguration and the acknowledgment are precisely inverted from thenormal configuration; a configured PAE receives the KW and sends an ACK.An unconfigured PAE rejects acceptance of the KW and sends REJ becausethe prerequisite for DIFFERENTIAL is a configured PAE.

[0034] There are various possibilities for performing a differentialreconfiguration. Either the differential reconfiguration is forcedwithout regard for the data processing operation actually taking placein a cell, for example; in that case, it is desirable to guaranteeaccurate synchronization with the data processing, which can beaccomplished through appropriate design and layout of the program. Torelieve the programmer of this job, however, differentialreconfigurability may also be made to depend on other events, e.g., theexistence of a certain state in another cell or in the cell that is tobe partially reconfigured. Therefore, an especially preferred variant isto store the configuration data, in particular the differentialconfiguration data, in or on the cell, e.g., in a dedicated register,and then call up the register contents, depending on a certain state,and enter it into the cell. This may be accomplished, for example, byswitching of a multiplexer.

[0035] The wave reconfiguration methods described below may also beused. It may optionally also be advisable to make a differentialconfiguration dependent on the results (ACK/REJ) of a configurationperformed previously in the normal manner. In other words, thedifferential configuration is performed only after arrival of ACK forthe previous nondifferential configuration.

[0036] An especially preferred variant of synchronization of thedifferential configuration can be used in general, depending on how manydifferent differential configurations are in fact needed. This is madepossible by the fact that the differential configuration is notprestored locally, but instead on recognition of a certain state, e.g.,the end of a data input or the like, a signal is generated with a firstcell, stopping the cell which is to be differentially reconfigured. Sucha signal may be a STOP signal. After or simultaneously with stoppingdata processing in the cell which is to be reconfigured differentially,a signal is sent to the CT, requesting differential reconfiguration ofthe stopped cell. This request signal for differential reconfigurationcan be generated and sent in particular by the cell which also generatesthe STOP signal. The CT will then send the data needed for differentialreconfiguration to the stopped cell and will thus trigger thedifferential reconfiguration. After differential reconfiguration, theSTOP mode is terminated. This can be accomplished by the CT inparticular. It should be pointed out that cache techniques can also beused in the differential reconfiguration method.

[0037] 3. Function of Triggers

[0038] Triggers are preferably used in VPU modules to transmit simpleinformation as listed below as examples. Triggers are transmitted by anydesired bus system (network), in particular a configurable bus system.The source and target of a trigger can be programmed.

[0039] A plurality of triggers can be transmitted simultaneously withina module. In special embodiments, in addition to direct transmissionfrom a source to a target, transmission from one source to multipledestinations or from multiple sources to one destination is alsopossible.

[0040] Triggers transmit mainly, but not exclusively, the following:

[0041] Status information from arithmetic units (ALUs), such as

[0042] carry

[0043] division by zero

[0044] zero

[0045] negative

[0046] underflow/overflow

[0047] Results of comparisons

[0048] n-bit information (for small n)

[0049] Interrupt request generated internally or externally

[0050] Blocking and enable orders

[0051] Requests for configurations

[0052] Triggers are generated by any cells and are triggered in theindividual cells by any events. For example, the status register and/orthe flag register can be used by ALUs or processors to generate triggersaccording to the related art. Triggers can also be generated by a CTand/or an external unit arranged outside the cell array or the module.

[0053] Triggers are received by any number of cells and are analyzed inany manner. In particular, triggers can be analyzed by a CT or anexternal unit arranged outside the cell array or the module.

[0054] An important area for use of triggers is for synchronization andcontrol of conditional executions and/or sequence controls in the array,which can be implemented by sequencers, for example, as well as theirinformation exchange.

[0055] 3.1. Semantics of Triggers

[0056] Triggers are used for the following actions within PAEs, forexample:

[0057] STEP: Execute an operation within a PAE upon receipt of thetrigger.

[0058] GO: Execute operations within a PAE upon receipt of the trigger.The execution is stopped by STOP.

[0059] STOP: Stop the execution started with GO; in this regard, seealso the preceding discussion of the STOP signal.

[0060] LOCAL RESET: Stop the execution and transfer from the “allocated”or “configured” state to the “not configured” state. WAVE: Stopping theexecution of operations and loading a wave reconfiguration to be loadedby the CT. In wave reconfiguration, one or more PAEs should besubsequently reconfigured to run through the end of a data packet. Thenthe processing of another data packet should take place preferablydirectly after reconfiguration, which may also be performed as adifferential reconfiguration. For example, a first audio data packet isto be processed with first filter coefficients; after running throughthe first audio data packet, a partial reconfiguration is to take place,and then a different audio data packet is to be processed with a secondset of filter coefficients. To do so, the new reconfiguration data,e.g., the second filter coefficients, can be deposited in or at thecell, and the reconfiguration can be prompted automatically onrecognition of the end of the first data packet without requiringfurther intervention of a CT or another external control unit for this,for example. Recognition of the end of the first data packet, i.e., thetime when the reconfiguration is to be performed, can be accomplished bygenerating a wave reconfiguration trigger. It can be generated, forexample, in a cell which recognizes a data end; reconfiguration thenruns from cell to cell with the trigger as soon as the cells haveconcluded processing of the first data packet, comparable to a “wave”running through a soccer stadium. To do so, a single cell may generatethe trigger and send it to a first cell, for example, to indicate to itthat the end of a first packet has been run through. This first cell tobe reconfigured, addressed by the wave trigger generating cell, alsorelays the wave trigger signal simultaneously with the results derivedfrom the last data of the first packet, which are sent to one or moresubsequently processing cells, sending the signal to these subsequentlyprocessing cells. The wave trigger signal can also be sent or relayed inparticular to those cells which are not currently involved in processingthe first data packet and/or do not receive any results derived from thelast data. Then the first cell to be reconfigured, which is addressed bythe wave trigger signal generating cell, is reconfigured and beginsprocessing the data of the second data packet. During this period oftime; the subsequent cells are also still processing the first datapacket. It should be pointed out that the wave trigger signal generatingcell can address not only individual cells, but also multiple cellswhich are to be reconfigured. This results in an avalanche-likepropagation of the wave configuration.

[0061] Data processing is continued as soon as the wave reconfigurationhas been configured completely. In Wave, it is possible to selectwhether data processing is continued immediately after completeconfiguration or whether there is a wait for arrival of a STEP or GO.

[0062] SELECT: Selects an input bus for relaying to the output. Example:Either bus A or bus B is to be switched to an output. The setting of themultiplexer and thus the selection of the bus are selected by SELECT.

[0063] Triggers are used for the following actions within CTs, forexample:

[0064] CONFIG: A configuration is to be configured by the CT into thePA.

[0065] PRELOAD: A configuration is to be preloaded by the CT into itslocal memory. Therefore, the configuration need be loaded only uponreceipt of CONFIG. This results in the effect of a predictable caching.

[0066] CLEAR: A configuration is to be deleted by the CT from itsmemory.

[0067] Incoming triggers reference a certain configuration. Thecorresponding method is described below.

[0068] The semantics is not assigned to a trigger signal in the network.Instead, a trigger represents only a state. The way this state isutilized by the respective receiving Pae is configured in the receivingPAE. In other words, the sending PAE sends only its status, and thereceiving PAE generates the semantics valid for it. If several PAEsreceive one trigger, different semantics may be used in each PAE, i.e.,a different response occurs in each. For example, a first PAE may bestopped, and a second PAE may be reconfigured. If multiple PAEs send onetrigger, the event generating the trigger may be different in each PAE.

[0069] It should be pointed out that a wave reconfiguration and/or apartial reconfiguration can also take place in bus systems and the like.A partial reconfiguration of a bus can take place, for example, inreconfiguration by sections.

[0070] 3.2. System Status and Program Pointer

[0071] A system is a module or an interlinked group of modules,depending on the implementation. For managing an array of PAEs, which isdesigned to include several modules in the case of a system, it is not[necessary] to know the status or program pointer of each PAE. Severalcases are differentiated below in order to explain this further:

[0072] PAEs as components not having a processor property. Such PAEs donot need their own program pointer. The status of an individual PAE isusually irrelevant, because only certain PAEs have a usable status (seePACTO1, where the status represented by a PAE is not a program counterbut instead is a data counter). The status of a group of PAEs isdetermined by the linking of the states of the individual relevant PAEs.In other words, the information within the network of triggersrepresents the status.

[0073] PAEs as processors. These PAEs have their own internal programpointer and status. Preferably only the information of one PAE which isrelevant for the other PAEs or another PAE is exchanged by triggers.

[0074] The interaction among PAEs yields a common status which can beanalyzed, e.g., in the CT, to determine how a reconfiguration is to takeplace. Then it is possible to optionally take into account how theinstantaneous configuration of the network of lines and/or buses used totransmit the triggers, the network being optionally configurable.

[0075] The array of PAEs (PA) thus has a global status. The essentialinformation is sent through certain triggers to the CT, which controlsthe program execution through reconfiguration on the basis of thesetriggers. It is especially noteworthy that there is no longer a programcounter here.

[0076] 4. (Re)Configuration

[0077] VPU modules are configured or reconfigured on the basis ofevents. These events may be represented by triggers (CONFIG) transmittedto a CT. An incoming trigger references a certain configuration(SubConf) for certain PAEs. The referenced SubConf is sent to one ormore PAEs. Referencing takes place by using a lookup system according tothe related art or any other address conversion or address generation.For example, the address of the executing configuration (SubConf) couldbe calculated as follows on the basis of the number of an incomingtrigger if the SubConfs have a fixed length: offset+(triggernumber*SubConf length).

[0078] VPU modules have three configuration modes:

[0079] a) Global configuration: The entire VPU is reconfigured. Theentire VPU must be in a configurable state, i.e., it must beunconfigured.

[0080] b) Local configuration: A portion of the VPU is reconfigured. Thelocal portion of the VPU which is to be reconfigured must be in aconfigurable state, i.e., it must be unconfigured.

[0081] c) Differential configuration: An existing configuration ismodified PAEs to be reconfiqured must be in a configured state, i.e,they must be configured.

[0082] A configuration is composed of a quantity of configuration words(KWs). Each configuration can be referenced per se, e.g., by a referencenumber (ID), which may be unique if necessary.

[0083] A number of KWs identified by an ID is referred to below as asubconfiguration (SubConf). Several different as well as similarSubConfs, which run simultaneously on different PAEs, may be configuredin a VPU.

[0084] A PAE may have one or more configuration registers, oneconfiguration word (KW) describing one configuration register. A KW isbasically assigned the address of the PAE to be configured. Likewise,information indicating the type of configuration is usually assigned toa KW. This information can be implemented technically by flags or bycoding. Flags are described in detail below.

[0085] 4.1. ModuleID

[0086] For most operations, it is sufficient for the CT to know theallocation of a configuration word and of the respective PAE to aSubConf. For more complex operations in the processing array, however,the ID of the SubConf assigned to it should be stored in each PAE.

[0087] An ID stored in the PA is referred to below as moduleID todifferentiate the IDs within the CTs. There are several reasons forintroducing moduleID, some of which are described here:

[0088] A PAE may be switched only to a bus which also belongs to thecorresponding SubConf. If a PAE is switched to the wrong (different)bus, this can result in processing of incorrect data. This problem canbe solved essentially by configuring buses prior to PAEs, which leads toa rigid order of KWs within a SubConf. By introducing moduleID, this canbe avoided by the fact that a PAE compares its stored moduleID with thatof the buses assigned to it and switches to a bus only when its moduleIDmatches that of the PAE. As long as the two moduleIDs are different, thebus connection is not established. As an alternative, a bus sharingmanagement can also be implemented (see PACT 07).

[0089] PAEs can be converted to the “unconfigured” state by a localreset signal. Local reset originates from a PAE in the array and notfrom a CT, and therefore is “local”. To do so, the signal must beconnected between all PAEs of a SubConf. This procedure becomesproblematical when a SubConf that has not yet been completely configuredis to be deleted, and therefore not all PAEs are connected to localreset. By using moduleID, the CT can broadcast a command to all PAEs.PAEs with the corresponding moduleID change their status to “notconfigured”.

[0090] In many applications, a SubConf can be started only at a certaintime, but it can already be configured in advance. By using themoduleID, the CT can broadcast a command to all PAEs. The PAEs with thecorresponding moduleID then start the data processing.

[0091] Essentially, the moduleID may also be identical to the ID storedin the CT.

[0092] The moduleID is written into a configuration register in therespective PAE. Since IDs usually have a considerable width of more than10 bits in most cases, it is not efficient to provide such a largeregister in each PAE.

[0093] Therefore, it is proposed that the moduleID of the respectiveSubConf be derived from the ID, that it have a small width and beunique. Since the number of all modules within a PA is typicallycomparatively small, a moduleID width of a few bits (e.a., 4 to 5 bits)is sufficient. The ID and moduleID can be mapped bijectively on oneanother. In other words, the moduleID uniquely identifies a configuredmodule within an array at a certain point in time. The moduleID isissued to a SubConf before configuration in such a way that the SubConfis uniquely identifiable in the PA at the time of execution. A SubConfcan be configured into the PA multiple times simultaneously (see macros,described below). A unique moduleID is issued for each configuredSubConf for unambiguous allocation.

[0094] The transformation of an ID to a moduleID can take place by wayof lookup tables or lists. Since numerous mapping methods are known forthis purpose, only one possibility is explained in greater detail here:

[0095] A list whose length is 2^(ModuleID) contains the number of allIDs configured in the array at the moment, one ID being allocated toeach list entry. The entry “0” characterizes an unused moduleID. If anew ID is configured, it must be assigned to a free list entry, whoseaddress yields the corresponding moduleID. The ID is entered into thelist at the moduleID address. On deletion of an ID, the correspondinglist entry is reset at “0”.

[0096] 4.2. PAE States

[0097] Each KW is provided with additional flags which check and controlthe status of a PAE:

[0098] CHECK: An unconfigured PAE is allocated and configured. If thestatus of the PAE is “not configured,” the PAE is configured with theKW. This process is acknowledged with ACK.

[0099] If the PAE is in the “configured” or “allocated” state, the KW isnot accepted. The rejection is acknowledged with REJ.

[0100] After receipt of CHECK, a PAE is switched to an “allocated”state. Any additional CHECK is rejected, but data processing is notstarted.

[0101] DIFFERENTIAL: The configuration registers of a PAE that hasalready been configured are modified. If the status of the PAE is“configured” or “allocated,” then the PAE is modified using the KW. Thisprocess is acknowledged with ACK. If the PAE is in the “unconfigured”state, the KW is not accepted but is acknowledged by REJ (reject).

[0102] GO: Data processing is started. GO can be sent individually ortogether with CHECK or DIFFERENTIAL.

[0103] WAVE: A configuration is linked to the data processing. As soonas the WAVE trigger is received, the configuration characterized withthe WAVE flag is loaded into the PAE. If WAVE configuration is performedbefore receipt of the trigger, the KWs characterized with the WAVE flagremain stored until receipt of the trigger and become active only withthe trigger. If the WAVE trigger is received before the KW which has theWAVE flag, data processing is stopped until the KW is received.

[0104] At least CHECK or DIFFERENTIAL must be set for each KWtransmitted. However, CHECK and DIFFERENTIAL are not allowed at the sametime. CHECK and GO or DIFFERENTIAL and GO are allowed and will startdata processing.

[0105] In addition, a flag which is not assigned to any KW and is setexplicitly by the CT is also implemented:

[0106] LOCK: A PAE cannot switch to the “not configured” state at will.If this were the case, the cell could still be configured, for example,and it could be involved with the processing of data while an attempt isbeing made to write a first configuration from the FILMO memory into thecell; then the cell terminates its activity during the additional FILMOrun. Therefore, in principle, i.e., without any additional, measures, asecond following configuration, which is stored in FILMO and mayactually be executed only after the first configuration, could occupythis cell. This could then result in DEADLOCK situations. By temporarilylimiting the change of configurability of the cell through the LOCKcommand, such a DEADLOCK can be avoided by preventing cell from beingconfigurable at an unwanted time. This locking of the cell againstreconfiguration can take place in particular either when FILMO is runthrough, regardless of whether it is a cell which is in fact accessedfor the purpose of reconfiguration, or alternatively, the cell can belocked to prevent reconfiguration by prohibiting the cell from beingreconfigured for a certain phase, after the first unsuccessful access tothe cell by a first configuration of the cell in the FILMO; thisprevents inclusion of the second configuration only in those cells whichare to be accessed with an earlier configuration.

[0107] Thus, according to the FILMO principle, a change is allowed inFILMO only during certain states. As discussed above, the FILMO statemachine controls the transition to the “not configured” state throughLOCK.

[0108] Depending on the implementation, the PAE transmits itsinstantaneous status to a higher-level control unit (e.g., therespective CT) or stores it locally.

[0109] Transition Tables

[0110] The simplest implementation of a state machine for observing theFILMO protocol is possible without using WAVE or CHECK/DIFFERENTIAL.Only the GO flag is implemented here, a configuration being composed ofKWs transmitted together with GO. The following states can beimplemented:

[0111] Not configured: The PAE behaves completely neutrally, i.e., itdoes not accept any data or triggers, nor does it send any data ortriggers. It waits for a configuration. Differential configurations, ifimplemented, are rejected.

[0112] Configured: The PAE is configured and it processes data andtriggers. Other configurations are rejected; differentialconfigurations, if implemented, are accepted.

[0113] Wait for lock: The PAE receives a request for reconfiguration(e.g., through local reset or by setting a bit in a configurationregister). Data processing is stopped, and the PAE waits forcancellation of LOCK to be able to change to the “not configured” state.Current PAE status Event Next status not configured GO flag configuredconfigured Local Reset Trigger wait for lock wait for lock LOCK flag notconfigured

[0114] A completed state machine according to the method described heremakes it possible to configure a PAE which requires several KWs. This isthe case, for example, when a configuration which refers to severalconstants is to be transmitted, and these constants are also to bewritten into the PAE after or together with the actual configuration.Another status is required for this purpose.

[0115] Allocated: The PAEs have been checked by CHECK and are ready forconfiguration. In the allocated state, the PAE is not yet processing anydata. Other KWs marked as DIFFERENTIAL are accepted. KWs marked withCHECK are rejected.

[0116] A corresponding transition table is shown below; WAVE is not yetimplemented: Current PAE status Event Next status not configured CHECKflag allocated not configured GO flag configured allocated GO flagconfigured configured Local Reset Trigger wait for lock wait for lockLOCK flag not configured

[0117] 4.2.1. Implementation of GO

[0118] GO is

[0119] either set immediately during the configuration of a PAE togetherwith the KW in order to be able to start data processing immediately

[0120] or sent to the respective PAEs after conclusion of the entireSubConf.

[0121] The GO flag can be implemented in various ways:

[0122] a) Register

[0123] Each PAE has a register which is set at the start of processing.The technical implementation is comparatively simple, but aconfiguration cycle is required for each PAE. GO is transmitted togetherwith the KW as a flag according to the previous description.

[0124] For the case when it is important in which order PAEs ofdifferent PACs belonging to one EnhSubConf are configured, anothermethod is proposed to ensure that this chronological dependence will bemaintained. By definition, there are also multiple CTs when there aremultiple PACs, so that in order to exchange information, they notify oneanother regarding whether all PAEs which must be configured before thenext in each PAE have already accepted their GO from the sameconfiguration.

[0125] One possibility of resolving the chronological dependencies andpreventing unallowed GOs from being sent is to reassign the KWs so thata correct order is ensured by FILMO by the succession of theirconfigurations. FILMO then marks, optionally by a flag for eachconfiguration, whether all GOs of the current configuration have beenaccepted so far. If this is not the case, no additional GOs of thisconfiguration are sent. The initial status with each new configurationis such, as if all GOs so far have been accepted.

[0126] To increase the probability that some PAEs are no longer beingconfigured during the configuration, the KWs of an at least partiallysequential configuration can be re-sorted so that the KWs of therespective PAEs are configured at a later point in time. Likewise, ifnecessary, certain PAEs can be activated sooner by rearranging the KWsof the respective configuration so that the respective PAEs areconfigured at an earlier point in time. These methods can be used inparticular if the order of the KWs is not already determined completelyby time dependencies which must be maintained even after resorting.

[0127] b) Wiring by Conductor

[0128] As is the case in use of the local reset signal, PAEs arecombined into groups which are to be started jointly. Within this group,all PAEs are connected to a line for distribution of GO. If one group isto be started, GO is signaled to a first PAE, which is implemented bysending a signal or setting a register (see a)) of this first PAE. Fromthis PAE, GO is relayed to all the other PAEs. One configuration cycleis necessary for starting. For relaying, a latency time is needed tobridge great distances.

[0129] c) Broadcast

[0130] A modification of a) and b) offers a high performance (only oneconfiguration cycle) with a comparatively low complexity.

[0131] For this purpose, all modules receive a moduleID which is usuallydifferent from the SubConfID.

[0132] The size of the moduleID should be kept as small as possible; awidth of a few bits (3 to 5) should be sufficient, if possible. The useof moduleID is explained in greater detail below.

[0133] During configuration, the corresponding moduleID is written toeach PAE.

[0134] GO is then started by a broadcast, by sending the moduleIDtogether with the GO command to the array. The command is received byall PAEs, but is executed only by the PAEs having the proper moduleID.

[0135] 4.2.2. Locking the PAE Status

[0136] The status of a PAE must not change from “configured” to “notconfigured” within a configuration or a FILMO run. Example: Twodifferent SubConfs (A, D) share the same resources, in particular, a PAEX. In FILMO, SubConf A precedes SubConf D in time. SubConf A musttherefore occupy the resources before SubConf D. PAE X is “configured”at the configuration time of SubConf A, but it changes its status to“not configured” before the configuration of SubConf D. This results ina deadlock situation, because now SubConf A can no longer configure PAEX, but SubConf D can no longer configure the remaining resources whichare already occupied by SubConf A. Neither SubConf A nor SubConf D canbe executed. As mentioned previously, LOCK ensures that the status of aPAE does not change in an inadmissible manner during a FILMO run. Forthe FILMO principle it is irrelevant how the status is locked, butnevertheless several possibilities should be discussed.

[0137] Basic Principle of LOCK

[0138] Before beginning the first configuration and with each new run ofFILMO, the status of the PAEs is locked. After the end of each run, thestatus is released again. Thus, certain changes in status are allowedonly once per run.

[0139] Explicit LOCK

[0140] The lock signal is set only after the first REJ from the PA sincethe start of a FILMO run. This is possible because previously all thePAEs could be configured and thus already were in the “unconfigured”state. Only a PAE which generates a REJ could change its status from“configured” to “not configured” during the additional FILMO run. Adeadlock could occur only after this time, namely when a first KWreceives a REJ and a later one is configured. However, the transitionfrom “configured” to “not configured” is prevented by immediatelysetting LOCK after a REJ The essential advantage of this method is thatduring the first run phase, PAEs can still change their status, whichmeans in particular that they can change to the “unconfigured” state. Ifa PAE thus changes from “configured” to “not configured” during a runbefore a failed configuration attempt, then it can be configured in thesame configuration phase.

[0141] Implicit LOCK

[0142] An even more efficient extension of the explicit LOCK is theimplicit handling of LOCK within a PAE.

[0143] In general, only PAEs which have rejected (REJ) a configurationare affected by the lock status. Therefore, it is sufficient during aFILMO run to lock the status only within PAEs that have generated a REJ.All other PAEs remain unaffected. LOCK is no longer generated by ahigher-level instance (CT). Instead, after a FILMO run, the lock statusin the respective PAEs is canceled by a FREE signal. FREE can bebroadcast to all PAEs directly after a FILMO run and can also bepipelined through the array.

[0144] Extended Transition Tables for Implicit LOCK:

[0145] A reject (REJ) generated by a PAE is stored locally in each PAE(REJD=rejected). The information is deleted only on return after “notconfigured.” Current PAE status Event Next status not configured CHECKflag Allocated not configured GO flag Configured allocated GO flagConfigured configured Local reset trigger Wait for free and reject(REJD) configured Local reset trigger not configured and no reject (notREJD) wait for free FREE flag not configured

[0146] The transition tables are given as examples. An actualimplementation will depend on the respective embodiment.

[0147] 4.2.3. Configuration of a PAE

[0148] The configuration sequence is to be described again in thissection from the standpoint of the CT. A PAE shall also be considered toinclude parts of a PAE if they manage the states described previously,independently of one another.

[0149] If a PAE is to be reconfigured, the first KW must set the CHECKflag to check the status of the PAE. A configuration for a PAE isconstructed so that either only one KW is configured: CHECK DIFFERENTIALGO KW X — * KW0

[0150] or multiple KWs are configured, with CHECK being set with thefirst KW and DIFFERENTIAL being set with all additional KWs. CHECKDIFFERENTIAL GO KW X — — KW0 — X — KW1 — X — KW2 — X * KWn

[0151] If CHECK is rejected (REJ), no subsequent KW with a DIFFERENTIALflag is sent to the PAE. After CHECK is accepted (ACK), all additionalCHECKs are rejected until the return to the state “not configured” andthe PAE is allocated for the accepted SubConf. Within this SubConf, thenext KWs are configured exclusively with DIFFERENTIAL. This is allowedbecause it is known by CHECK that this SubConf has access rights to thePAE.

[0152] 4.2.4 Resetting to the Status “Not Configured”

[0153] Due to a specially designed trigger (local reset), a signal whichtriggers local resetting of the “configured” state to “not configured”is triggered in the receiving PAEs, at the latest after a LOCK or FREEsignal is received. Resetting may also be triggered by other sources,such as a configuration register.

[0154] Local reset can be relayed from the source generating the signalover all existing configurable bus connections, i.e., all trigger busesand all data buses, to each PAE connected to the buses. Essentially theneach PAE receiving a local reset in turn relays the signal over all theconnected buses.

[0155] However, to prevent the local reset trigger from being relayedbeyond the limit of a local group, it is possible to configureindependently for each cell whether and over which connected buses thelocal reset is to be relayed.

[0156] 4.2.4.1. Deleting an Incompletely Configured SubConf

[0157] In some cases, it may happen that configuration of a SubConf isbegun, but during configuration it is found that the SubConf is eithernot (no longer) needed or is not needed completely. Under somecircumstances, local reset does not change the status of all PAEs to“not configured” because the bus has not yet been completelyestablished. Two possibilities are proposed according to the presentinvention. In both methods, the PAE which would have generated the localreset sends a trigger to the CT. Then the CT informs the PAEs asfollows:

[0158] 4.2.4.2. When Using ModuleID

[0159] If a possibility for storage of the moduleID is provided withineach PAE, then each PAE can be requested to go to the status “notconfigured” with this specific ID by way of a simple broadcast in whichthe ID is also sent.

[0160] 4.2.4.3. When Using the GO Signal

[0161] If a GO line is wired in exactly the order in which the PAEs areconfigured, it is possible to assign a reset line to the GO line to setall the PAEs in the state “not configured.”

[0162] 4.2.4.4. Explicit Reset by the Configuration Register

[0163] In each PAE, a bit or a code is defined within the configurationregister, so that when this bit or code is set by the CT, the PAE isreset in the state “not configured.”

[0164] 4.3. Holding the Data in the PAEs

[0165] It is especially advantageous with modules of the generic typeaccording to the present invention if the data and states of a PAE canbe held beyond a reconfiguration. In other words, it is possible topreserve data stored within a PAE despite reconfiguration. Throughappropriate information in the KWs, it is defined for each relevantregister whether it is reset by the reconfiguration.

EXAMPLE

[0166] If a bit within a KW is logical 0, for example, the currentregister value of the respective data register or status register isretained, and a logical 1 resets the value of the register. Acorresponding KW might then have the following structure: Input registerOutput register Status flags A B C H L equal/ overflow zero

[0167] It is thus possible with each reconfiguration to select whetheror not the data will be preserved.

[0168] 4.4. Setting Data in the PAEs

[0169] In addition, it is possible to write data into the registers ofthe PAEs during reconfiguration of the CT. The relevant registers can beaddressed by KWs. A separate bit indicates whether the data is to betreated as a constant or as a data word.

[0170] A constant is retained until it is reset.

[0171] A data word is valid for precisely a certain number of counts,e.g., precisely one count. After processing the data word, the data wordwritten to the register by the CT no longer exists.

[0172] 5. Extensions

[0173] The bus protocol can be extended by also pipelining the KWs andACK/REJ signals through registers.

[0174] This is especially advantageous and is regarded as patentable byitself or in combination with other features.

[0175] One KW or multiple KWs can be sent in each clock cycle. The FILMOprinciple is to be maintained. The basic principle provides forestablishing an allocation to a KW written to the PA in such a way thatthe delayed acknowledgment is allocated subsequently to the KW. KWsdepending on the acknowledgment are re-sorted so that they are processedonly after receipt of the acknowledgment.

[0176] The alternative methods described below meet these requirementsand have various advantages.

[0177] 5.1. Lookup Tables (STATELUT)

[0178] Each PAE sends its status to a lookup table (STATELUT) which isimplemented locally in the CT. In sending a KW, the CT checks the statusof the addressed PAE via a lookup in the STATELUT. The acknowledgment(ACK/REJ) is generated by the STATELUT.

[0179] This method functions as follows in detail.

[0180] In a CT, the status of each individual PAE is managed in a memoryor a register set. For each PAE there is an entry indicating in whichmode (“configured,” “not configured”) the PAE is. On the basis of thisentry, the CT checks on whether the PAE can be reconfigured. This statusis checked internally by the CT, i.e., without checking back with thePAEs. Each PAE sends its status independently or after a request,depending on the implementation, to the internal STATELUT within the CT.When LOCK is set or there is no FREE signal, no changes in status aresent by the PAEs to the STATELUT and none are received by the STATELUT.

[0181] The status of the PAEs is monitored by a simple mechanism, withthe mechanisms of status control and the known states that have alreadybeen described being implemented.

[0182] Setting the “Configured” Status

[0183] When writing a KW provided with a CHECK flag, the PAE addressedis checked in the STATELUT.

[0184] If the PAE is in a reconfigurable state, the PAE is marked as“allocated” in the STATELUT.

[0185] As soon as the PAE is started (GO), the PAE is entered as“configured.”

[0186] Resetting the “Configured” Status to “not Configured”

[0187] Several methods can be used, depending on the application andimplementation:

[0188] a) Each PAE sends a status signal to the table when its statuschanges from “configured” to “not configured.” This status signal can besent pipelined.

[0189] b) A status signal (local reset) is sent for a group of PAEs,indicating that the status has changed from “configured” to “notconfigured” for the entire group. All the PAEs belonging to the groupare selected according to a list, and the status for each individual PAEis changed in the table. It is essential that the status signal is sentto the CT from the last PAE of a group removed by a local reset signal.Otherwise, there may be inconsistencies between the STATELUT and theactual status of the PAEs in that the STATELUT lists a PAE as “notconfigured” although it is in fact still in a “configured” state.

[0190] c) After receipt of a LOCK signal, preferably pipelined, each PAEwhose status has changed since the last receipt of LOCK sends its statusto the STATELUT. LOCK here receives the “TRANSFER STATUS” semantics.However, PAEs transmit their status only after this request, andotherwise the status change is locked, so the method remains the sameexcept for the inverted semantics.

[0191] To check the status of a PAE during configuration, the STATELUTis queried when the address of the target PAE of a KW is sent and an ACKor REJ is generated accordingly. A KW is sent to a PAE only if no REJhas been generated or if the DIFFERENTIAL flag has been set.

[0192] This method ensures the chronological order of KWs. Only validKWs are sent to the PAEs. One disadvantage here is the complexity of theimplementation of the STATELUT and the resending of the PAE states tothe STATELUT as well as the bus bandwidth and running time required forthis.

[0193] 5.2. Re-Sorting the KWs

[0194] The use of the CHECK flag for each first KW (KW1) sent to a PAEis essential for use of the following method.

[0195] The SubConf is resorted as follows:

[0196] 1. First, KW1 of a first PAE is written. In the time (DELAY)until the receipt of the acknowledgment (ACK/REJ), there follow exactlyas many dummy cycles (NOPs) as cycles have elapsed.

[0197] 2. Then the KW1 of a second PAE is written. During DELAY theremaining KWs of the first PAE can be written. Any remaining cycles arefilled with dummy cycles. The configuration block from KW1 until theexpiration of DELAY is called an atom.

[0198] 3. The same procedure is followed with each additional PAE.

[0199] 4. If more KWs are written in the case of a PAE than there arecycles during DELAY, the remaining portion is distributed among thefollowing atoms. As an alternative, the DELAY can also be activelylengthened, so a larger number of KWs can be written in the same atom.

[0200] Upon receipt of ACK for a KW1, all additional KW1s for thecorresponding PAE are configured. If the PAE acknowledges this with REJ,no other KW pertaining to the PAE is configured.

[0201] This method guarantees that the proper order will be maintainedin configuration.

[0202] It is a disadvantage that the optimum configuration speed cannotbe achieved. To maintain the proper order, the waiting time of an atommay optionally have to be filled with dummy cycles (NOPs), so the usablebandwidth and the size of a SubConf are increased by the NOPs.

[0203] In addition, a paradox is a restriction on the configurationspeed that cannot be solved. To minimize the amount of configurationdata and configuration cycles, the number of configuration registersmust be minimized. At higher frequencies, DELAY necessarily becomeslarger, so this collides with the requirement that DELAY be usedappropriately by filling up with KW.

[0204] Therefore, this method seems to be appropriate for use only inserial transmission of configuration data. Due to the serialization ofKWs, the data stream is long enough to fill up the waiting time.

[0205] 5.3. Analyzing the ACK/REJ Acknowledgment with Latency (CHECK,ACK/REJ)

[0206] The CHECK signal is sent to the addressed PAE with the KWs overone or more pipeline stages. The addressed PAE acknowledges (ACK/REJ)this to the CT, also pipelined.

[0207] In each cycle, a KW is sent whose acknowledgment (ACK/REJ) isreceived by the CT n cycles later and is analyzed. However, during thisperiod of time, no additional KWs are sent. This results in two problemareas:

[0208] Controlling the FILMO

[0209] Maintaining the sequence of KWs

[0210] 5.3.1. Controlling the FILMO

[0211] Within the FILMO, it must be noted which KWs have been acceptedby a PAE (ACK) and which have been rejected (REJ). Rejected KWs are sentagain in a later FILMO run. In this later run, it is appropriate forreasons of configuration efficiency to run through only the KWs thathave been rejected.

[0212] The requests described here can be implemented as follows:Another memory (RELJMP) which has the same depth as the FILMO isassigned to the FILMO. A first counter (ADR_CNT) points to the addressof the KW in the FILMO being written at the moment into the PAE array. Asecond counter (ACK/REJ_CNT) points to the position of the KW in theFILMO whose acknowledgment (ACK/REJ) is returning from the array at thatmoment. A register (LASTREJ) stores the value of ACK/REJ_CNT whichpoints to the address of the last KW whose configuration wasacknowledged with REJ. A subtractor calculates the difference betweenACK/REJ_CNT and LASTREJ. On occurrence of a REJ, this difference iswritten into the memory location having the address LASTREJ in thememory RELJMP.

[0213] RELJMP thus contains the relative jump width between a rejectedKW and the following KW.

[0214] 1. A RELJMP entry of “0” (zero) is assigned to each accepted KW.

[0215] 2. A RELJMP entry of “>0” (greater than zero) is assigned to eachrejected KW. The address of the next rejected KW is calculated in theFILMO by adding the current address having the RELJMP entry.

[0216] 3. A RELJMP entry of “0” (zero) is assigned to the last rejectedKW, indicating the end. The memory location of the first address of aSubConf is occupied by a NOP in the FILMO. The associated RELJMPcontains the relative jump to the first KW to be processed.

[0217] 1. In the first run of the FILMO, the value is “1” (one).

[0218] 2. In a subsequent run, the value points to the first KW to beprocessed, so it is “>0” (greater than zero).

[0219] 3. If all KWs of the SubConf have been configured, the value is“0” (zero), by which the state machine determines that the configurationhas been completely processed.

[0220] 5.3.2. Observing the Sequence (BARRIER)

[0221] In the method described in section 5.3, it is impossible toguarantee a certain configuration sequence. This method only ensures theFILMO requirements according to 2.1a)-c).

[0222] In certain applications, it is relevant to observe theconfiguration sequence within a SubConf (2.1 e)) and to maintain theconfiguration sequence of the individual SubConfs themselves (2.1 d)).

[0223] Observing sequences is achieved by partitioning SubConf intomultiple blocks. A token (BARRIER) is inserted between individualblocks, and can be skipped only if none of the preceding KWs has beenrejected (REJ).

[0224] If the configuration reaches a BARRIER, and REJ has occurredpreviously, the BARRIER must not be skipped. A distinction is madebetween at least two types of barriers:

[0225] a) Nonblocking: The configuration is continued with the followingSubConf.

[0226] b) Blocking: The configuration is continued with additional runsof the current SubConf. BARRIER is not skipped until the current SubConfhas been configured completely.

[0227] Considerations on optimization of the configuration speed: It isnot normally necessary to observe the sequence of the configuration ofthe individual KWs. However, the sequence of activation of theindividual PAEs (GO) must be observed exactly. The speed of theconfiguration can be increased by re-sorting the KWs so that all the KWsin which the GO flag has not been set are pulled before the BARRIER.Likewise, all the KWs in which the CHECK flag has been set must bepulled before the BARRIER. If a PAE is configured with only one KW, theKW must be split into two words, the CHECK flag being set before theBARRIER and the GO flag after the BARRIER.

[0228] At the BARRIER it is known whether all CHECKS have beenacknowledged with ACK. Since a reject (REJ) occurs only when the CHECKflag is set, all KWs behind the barrier are essentially executed in thecorrect order. The KWs behind the barrier are run through exactly onlyonce, and the start of the individual PAEs takes place properly.

[0229] 5.3.3. Garbage Collector

[0230] Two different implementations of a garbage collector (GC) aresuggested for the method according to 5.3:

[0231] a) A GC implemented as an algorithm or a simple state machine: Atthe beginning, two pointers point to the starting address of the FILMO:a first pointer (read pointer) points to the current KW to be read bythe GC, and a second pointer (write pointer) points to the position towhich the KW is to be written. Read pointer is incremented linearly.Each KW whose RelJmp is not equal to “0” (zero) is written to the writepointer address. RelJmp is set at “1” and write pointer is incremented.

[0232] b) The GC is integrated into the FILMO by adding a write pointerto the readout pointer of the FILMO. At the beginning of the FILMO run,the write pointer points to the first entry. Each KW that has beenrejected with a REJ in configuration of a PAE is written to the memorylocation to which the write pointer points. Then write pointer isincremented. An additional FIFO-like memory (e.g., including a shiftregister) is necessary to temporarily store the KW sent to a PAE in theproper order until the ACK/REJ belonging to the KW is received by theFILMO again. Upon receipt of an ACK, the KW is ignored. Upon receipt ofREJ, the KW is written to the memory location to which the write pointeris pointing (as described above). The memory of the FILMO is thenpreferably designed as a multiport memory here. In this method, there isa new memory structure at the end of each FILMO run, with theunconfigured KWs standing in linear order at the beginning of thememory. No additional GC runs are necessary. Likewise, implementation ofRelJmp and the respective logic can be completely omitted.

[0233] 5.4. Prefetching of the ACK/REJ Acknowledgment with Latency

[0234] In conclusion, a refinement of 5.3 is to be described. Thedisadvantage of this method is the comparatively long latency time,corresponding to three times the length of the pipeline.

[0235] The addresses and/or flags of the respective PAEs to beconfigured are sent on a separate bus system before the actualconfiguration. The timing is designed so that at the time theconfiguration word is to be written into a PAE, its ACK/REJ informationis available. If acknowledged with ACK, the CONFIGURATION is performed;in the case of a reject (REJ), the KWs are not sent to the PAE(ACK/REJ-PREFETCH). FILMO protocol, in particular LOCK, ensures thatthere will be no unallowed status change of the PAEs betweenACK/REJ-PREFETCH and CONFIGURATION.

[0236] 5.4.1. Structure of FILMO

[0237] FILMO functions as follows: KWs are received in the correctorder, either (I) from the memory of the CT or (ii) from the FILMOmemory.

[0238] The PAE addresses of the KWs read out are sent to the PAEs,pipelined through a first bus system. The complete KWs are written to aFIFO-like memory having a fixed delay time (which may also be designedas a shift register, for example).

[0239] The respective PAE addressed acknowledges this by sending ACK orREJ, depending on its status. The depth of the FIFO corresponds to thenumber of cycles that elapse between sending the PAE address to a PAEand receipt of the acknowledgment of the PAE. The cycle from sending theaddress to a PAE until the acknowledgment of the PAE is received isknown as prefetch.

[0240] Due to the certain delay in the FIFO-like memory, which is notidentical to FILMO here, the acknowledgment of a PAE is received at theCT exactly at the time when the KW belonging to the PAE appears at theoutput of the FIFO. Upon receipt of ACK, the KW is sent to the PAE andno acknowledgment is expected. The PAE status has not changed in anadmissible manner in the meantime, so that acceptance is guaranteed.

[0241] Upon receipt of REJ, the KW is not sent to the PAE but instead iswritten back into the FILMO memory. An additional pointer is availablefor this, which pints to the first address at the beginning of linearreadout of the FILMO memory. The counter is incremented with each valuewritten back to the memory. In this way, rejected KWs are automaticallypacked linearly, which corresponds to an integrated garbage collectorrun (see also 5.3). This implementation is especially advantageous andis regarded as patentable by itself.

[0242] 5.4.2. Sending and Acknowledging Over a Register Pipeline

[0243] The method described here is used to ensure a uniform clock delaybetween messages sent and responses received if different numbers ofregisters are connected between one transmitter and multiple possiblereceivers of messages. One example of this would be if receivers arelocated at different distances from the transmitter. The message sentmay reach nearby receivers sooner than more remote receivers.

[0244] To achieve the same transit time for all responses, the responseis not sent back directly by the receiver, but instead is sent further,to the receiver at the greatest distance from the sender. This path musthave the exact number of receivers so that the response will be receivedat the time when a response sent simultaneously with the first messagewould be received at this point. From here out, the return takes placeexactly as if the response were generated in this receiver itself.

[0245] It does not matter here whether the response is actually sent tothe most remote receiver or whether it is sent to another chain havingregisters with the same time response.

[0246] 6. Hierarchical CT Protocol

[0247] As described in PACT10, VPU modules are scalable by constructinga tree of CTs, the lowest CTs (low-level CTs) of the PAs being arrangedon its leaves. A CT together with the PA assigned to it is known as aPAC. In general, any desired data or commands can be exchanged betweenCTs. Any technically appropriate protocol can be used for this purpose.

[0248] However, if the communication (inter-CT communication) causesSubConf to start on various low-level CTs within the CT tree (CTTREE),the requirements of the FILMO principle should be ensured to guaranteefreedom from deadlock.

[0249] In general, two cases are to be distinguished:

[0250] 1. In the case a low-level CT, the start of a SubConf which runsonly locally on the low-level CT and the PA assigned to it is requested.This case can be processed at any time within the CTTREE and does notrequire any special synchronization with other low-level CTs.

[0251] 2. In the case of a low-level CT, the start of a configurationrunning on multiple low-level CTs and the PAs assigned to them isrequested. In this case, it is important to be sure that theconfiguration is called up “atomarily” on all the CTs involved, i.e.,indivisibly. It is simplest to ensure adequately that no other SubConfis started during call-up and start of a certain SubConf. Such aprotocol is known from PACT10. However, a protocol that is even moreoptimized would be desirable.

[0252] The protocol from PACT10 is inefficient as soon as a pipelinedtransmission at higher frequencies is necessary, because buscommunication is subject to a long latency time.

[0253] One method is described in the following sections.

[0254] The main function of inter-CT communication is to ensure thatSubConfs involving multiple PACs (enhanced subconfiguration=EnhSubConf)are started without deadlock. EnhSubConfs are SubConfs that are not justexecuted locally on one PAC but instead are distributed among multiplePACs. An EnhSubConf includes multiple SubConfs, each started by way ofthe low-level CTs involved. A PAC is understood to be a PAE group havingat least one CT. To ensure freedom from deadlock, the followingrequirement must be met.

[0255] In order for multiple EnhSubConfs to be able to run on identicalPACs without deadlock, a prioritization of their execution is defined bya suitable mechanism (for example, within the CTTREE). If SubConfs areto be started from multiple different EnhSubConfs running on one and thesame PACs, then these SubConfs are started on the respective PACs in thechronological order corresponding to the prioritization of theEnhSubConf.

[0256] Example: Two EnhSubConfs are to be started, namely EnhSubConf-Aon PACs 1, 3, 4, 6 and EnhSubConf-B on PACs 3, 4, 5, 6. It is importantto ensure that EnhSubConf-A is always configured on PACs 3, 4 and 6exclusively either before or after EnhSubConf-B. For example, ifEnhSubConf-A is configured before EnhSubConf-B on PACs 3 and 4, and ifEnhSubConf-A is to be configured on PAC 6 after EnhSubConf-B, a deadlockoccurs because EnhSubConf-A could not be started on PAC 6, andEnhSubConf-B could not be started on PACs 3 and 4. Such a case isreferred to below as crossed or a cross.

[0257] It is sufficient to prevent EnhSubConfs from crossing. If thereis an algorithmic dependence between two EnhSubConfs, i.e., if oneEnhSubConf must be started after the other on the basis of thealgorithm, this is normally resolved by having one EnhSubConf start theother.

[0258] Embodiment of the protocol according to the present invention:Inter-CT communication distinguishes between two types of data:

[0259] a) a SubConf containing the configuration information,

[0260] b) an ID chain containing a list of IDs to be started, togetherwith the information regarding on which PAC the SubConf referenced bythe ID is to be started. One EnhSubConf is translated to the individualSubConfs to be executed by an ID chain: ID_(EnhSubConf)} ID chain {PAC₁:ID_(SubConf1)), (PAC₂: ID_(SubConf2)), (PAC₃: ID_(SubConf3)), . . .(PAC_(n): ID_(SubConfn))}

[0261] Inter-CT communication differentiates between the followingtransmission modes:

[0262] REQUEST: The start of an EnhSubConf is requested by a low-levelCT from the higher-level CT—or by a higher-level CT from a CT at an evenhigher level. This is repeated until reaching a CT which has stored theID chain or reaching the root CT, which always has the ID chain inmemory.

[0263] GRANT: A higher-level CT orders a lower-level CT to start aSubConf. This may be either a single SubConf or multiple SubConfs,depending on the ID chain.

[0264] GET: A CT requests a SubConf from a higher-level CT by sendingthe proper ID. If the higher-level CT has stored (cached) the SubConf ,it sends this to the lower-level CT; otherwise, it requests the SubConffrom an even higher-level CT and sends it to the lower-level CT afterreceipt. The root CT at the latest will have stored the SubConf.

[0265] DOWNLOAD: Loading a SubConf into a lower-level CT.

[0266] REQUEST activates the CTTREE either until reaching the root CT,i.e., the highest CT in the CTTREE, or until a CT in the CTTREE hasstored the ID chain. The ID chain can only be stored by a CT whichcontains all the CTs included in the list of the ID chain as leaves orbranches. In principle, the root CT (CTR, see PACT10) has access to theID chain in its memory. GRANT is then sent to all CTs listed in the IDchain. GRANT is sent “atomarily,” i.e., all the branches of a CT receiveGRANT either simultaneously or sequentially but without interruption byany other activity between one of the respective CTs and any other CTwhich could have an influence on the sequence of the starts of theSubConfs of different EnhSubConfs on the PACs. It is important that alow-level CT which receives a GRANT configures the corresponding SubConfinto the PA immediately and without interruption or writes it into FILMOor into a list which gives the configuration sequence, i.e., exactlywhen the SubConf must be written to FILMO or configured into the PA. Thesequence is essential to prevent a deadlock. If the SubConf is notalready stored in the low-level CT, the low-level CT must request theSubConf using GET from the higher-level CT. To comply with the methodaccording to the present invention, local SubConfs (i.e., SubConfs thatare not called up by an EnhSubConf but instead concern only the localPA) may be configured or loaded into FILMO between GET and the receiptof the SubConf (DOWNLOAD) if allowed or required by the algorithm.SubConfs of another EnhSubConf started by a GRANT received later may bestarted only after receipt of DOWNLOAD, as well as configuration andloading into FILMO.

[0267] Examples of the structure of SubConf are known from patentapplications PACT05 and PACT10. An important difference in the methodaccording to the present invention is its separate handling of call-upof SubConf by ID chains. An ID chain is a special embodiment of aSubConf having the following property:

[0268] For performance reasons, it is advisable to store individualSubConfs within the CTTREE i.e., cache them. Then, if necessary, aSubConf need not be reloaded completely, but instead is sent directly tothe lower-level CT from a CT which has cached the corresponding SubConf.In the case of an ID chain, it is essential for all the lower-level CTsto be loaded from a central CT according to the protocol of the presentinvention. To do so, it is most efficient if the CT at the lowest levelin the CTTREE, which still has all the PACs listed in the ID chain asleaves, has the ID chain in its cache. CTs at an even lower level mustnot store anything in their cache, because they are no longer locatedcentrally above all the PACs of the ID chain. Higher-level CTs loseefficiency because a longer communication link is necessary. If arequest reaches a CT having a complete ID chain for the EnhSubConfrequested, this CT triggers GRANTs to the lower-level CTs involved, withthe information being split out of the ID chain so that at least thepart needed in the respective branches is transmitted. To preventcrossing in such splitting, it is necessary to ensure that the next CTlevel will also trigger all GRANTs of its part of the EnhSubConf, namelywithout being interrupted by GRANTs of other EnhSubConfs. Onepossibility of implementing this is to transmit the respective parts ofthe ID chain “atomarily.” To control the caching of ID chains, it isadvisable to mark a split ID chain with a “SPLIT” flag, for example,during the transmission.

[0269] An ID chain is split as soon as it is loaded onto a CT which isno longer located centrally within the hierarchy of the CTTREE over allthe PACs referenced within the ID chain. In this case, the ID chain isno longer managed and cached by a single CT within the hierarchy.Multiple CTs process the portion of the ID chain containing the PACswhich are leaves of the respective CT. A REQUEST must always be relayedto a CT which manages all the respective PACs. This means that the firstand most efficient CT in terms of hierarchy (from the standpoint of thePACs) which can convert REQUEST to GRANT is the first CT in ascendingorder, starting from the leaves, which has a complete, unsplit ID chain.Management of the list having allocations of PAC to ID does not requireany further explanation. The list can be processed either by a programrunning within a CT or it may be created from a series of assemblerinstructions for controlling lower-level CTs.

[0270] A complete ID chain will then have the following structure:ID_(EnhSubConf)} ID chain{SPLIT, (PAC₁: ID_(SubConf1)), (PAC₂ :ID_(SubConf2)), (PAC₃: ID_(SubConf3)), . . . (PAC_(n): ID_(SubConfn))}

[0271] 6.1. Precachina SubConfs

[0272] Within the CTTREE, SubConfs are preloaded according to certainconditions, i.e., they are cached before they are actually needed. Thismethod greatly improves performance within the CTTREE.

[0273] Embodiments:

[0274] A plurality of precache requests are available, the most commonbeing:

[0275] a) A load request for an additional SubConf is programmed withina SubConf being processed at that moment on a low-level CT.

[0276] b) During data processing within the PA, a decision is made as towhich SubConf is to be preloaded. The CT assigned to the PA is requestedby a trigger, which is translated to the ID of a SubConf accordinglywithin the CT, to preload a SubConf. It is also possible for the ID of aSubConf to be calculated in the PA or to be configured in advance there.The message to the assigned CT then contains the ID directly.

[0277] The SubConf to be loaded is only cached, but not started. Thestart takes place only at the time when the SubConf would have beenstarted without prior caching. The difference is that at the time of thestart request, the SubConf is already stored in the low-level CT or oneof the middle-level CTs and therefore is either configured immediatelyor is loaded very rapidly onto the low-level CT and then started. Thiseliminates time-consuming run-through of the entire CTTREE.

[0278] A compiler, which generates the SubConf, makes it possible todecide which SubConf is to be cached next. Within the program sequencegraphs, it is possible to see which SubConfs could be executed next.These are then cached. The program execution decides in run time whichof the cached SubConfs is in fact to be started.

[0279] According to the mechanism of preloading certain SubConfs(precache), a mechanism is implemented which removes the cached SubConfto make room in the memory of the CT for other SubConfs. Likeprecaching, deletion of certain SubConfs by the compiler can bepredicted on the basis of program execution graphs.

[0280] In addition, the usual mechanisms for deletion of SubConfs (e.g.,the one configured last, the one configured first, the one configuredleast often (see PACT10)) are implemented in the CTs in order to managethe memory of the CT accordingly. It is important that not onlyexplicitly precached SubConfs can be deleted, but instead any SubConf ina CT memory can generally be deleted. If the garbage collector hasalready removed a certain SubConf, the explicit deletion becomes invalidand is ignored.

[0281] An explicit deletion can be brought about through a command whichcan be started by any SubConf. This includes any CT within the tree, itsown CT or explicit deletion of the same SubConf (i.e., deletion of itsown SubConf in which the command stands, in which case correcttermination must be ensured).

[0282] Another possibility of explicit deletion is to generate, on thebasis of a certain status within the PAs, a trigger which is relayed tothe CT and analyzed as a request for explicit deletion.

[0283] 6.2. Interdependencies Among PAEs

[0284] For the case when the sequence in which PAEs of different PACsbelonging to one EnhSubConf are configured is not irrelevant, anothermethod is proposed to ensure that this chronological dependence ismaintained. Since there are by definition multiple CTs in the case ofmultiple PACs, these also notify one another to exchange information asto whether all PAEs so far which must be configured before the next ineach PAC have already accepted their GO from the same configuration. Onepossibility of breaking up the time dependencies and preventingunallowed GOs from being sent is to exchange the exclusive right toconfiguration among the CTs and to reorganize the KWs so that a correctorder is ensured through the sequence of their configurations and thetransfer of the configuration rights. Depending on how strong thedependencies are, it may also be sufficient if both CTs configure theirrespective PA in parallel up to a synchronization point, then wait forone another and then continue configuring in parallel until the nextsynchronization point or, if none is available, until the end of theEnhSubConf.

[0285] 7. SubConf Macros

[0286] Caching of SubConf is especially efficient if as many SubConfs aspossible can be cached. Efficient use of caching is advisable especiallywith regard to high-level language compilers, because compilers oftengenerate recurring routines on an assembler level—namely on a SubConflevel in VPU technology.

[0287] In order to reuse SubConf as often as possible, SubConf macros(SubConfM) having the following properties should be introduced:

[0288] no absolute PAE addresses are given; instead a SubConf is aprelaid-out macro which uses only relative addresses;

[0289] application-dependent constants are transferred as parameters.

[0290] The absolute addresses are not calculated until the time when aSubConf is loaded into the PA. Parameters are replaced by their actualvalues. To do so, a modified copy of the original SubConf is created sothat either (I) this copy is stored in the memory of the CT (integratedFILMO) or (ii) it is written immediately to the PA, and only rejectedKWs (REJ) are written into FILMO (separate FILMO). Especially in case(ii), for performance reasons, a design is proposed for the addressadder in the hardware, sitting directly on the interface port of the CTto the PA/FILMO. Likewise, hardware implementations of parametertransformation are also conceivable, e.g., through a lookup table whichis loaded before configuration.

[0291] 8. Re-Storing Cache Statistics

[0292] International Patent WO 99/44120 (PACT10) describesapplication-dependent cache statistics and control. This method permitsan additional [?] data-dependent optimization of cache performancebecause the data-dependent program performance is expressed directly incache optimization.

[0293] One disadvantage of the known method is that cache performance isoptimized only during run time. When the application is restarted, thestatistics are lost. It is often even more critical that when a SubConfis removed from the cache, its statistics are also lost and are nolonger available when called up again even within the same applicationprocessing.

[0294] It is proposed according to the present invention that ontermination of an application or removal of a SubConf from the cache,the cache statistics should be sent first together with the respectiveID to the next higher-level CT by way of the known inter-CTcommunication until the root CT receives the respective statistics.[SubConf]¹ This stores the statistics in a suitable memory, namely in avolatile memory, a nonvolatile memory or a bulk memory, depending on theapplication. Access to the memory is optionally possible by way of ahost. The statistics are stored in such a way that they are allocated tothe respective SubConf so that the statistics are also loaded again whenreloading the SubConf. [SubConf] In a restart of SubConf, the statisticsare also loaded into the low-level CT.

[0295] The compiler either compiles neutral blank statistics orgenerates statistics which seem to be the next suitable statisticsaccording to a suitable method. These statistics preselected by thecompiler are then optimized in run time according to the methoddescribed here and/or stored and available in the optimized version thenext time the application is called up.

[0296] SubConf

[0297] If a SubConf is used by several applications or by differentlow-level CTs within one application (or if it is called up fromdifferent routines), then it is not appropriate to keep cache statisticsbecause it must be assumed that the different request performance andrun performance in each case will produce different statistics.

[0298] Therefore, depending on the application, either no statistics areused or a SubConfM is used.

[0299] When using a SubConfM, the transfer of parameters is extended sothat cache statistics are transferred as parameters. If a SubConfM isterminated, the cache statistics are written back to the SubConf(ORIGIN) which previously called up the SubConfM. In the termination ofORIGIN, the parameters are then stored together with the cachestatistics of ORIGIN according to the method described and are loadedaccordingly in a subsequent call-up and are again transferred asparameters to the SubConfM.

[0300] Keeping and storing application-based cache statistics isessentially also suitable for microprocessor, DIPS, FPGA and similarmodules.

[0301] 9. Structure of the Configuration Bus System

[0302] PACT07 describes an address- and pipeline-based data bus systemstructure. This bus system is suitable in general for transmittingconfiguration data. In order to transmit data and configurations overthe same bus system, status signals indicating the type of datatransmitted are introduced.

[0303] The bus system is designed so that the CT can optionally readback configuration registers and data registers from a PAE addressedpreviously by the CT.

[0304] Global data as defined in PACT07 as well as KWs are transmittedover the bus system. The CT acts as its own bus node. A status signalcharacterizes the transmission mode. For example, the followingstructure is possible with signals S0 and S1: S1 S0 Meaning 0 0 Writedata 0 1 Read data 1 0 Write a KW and/or a PAE address 1 1 Return a KWor any register from the addressed PAE

[0305] The REJ signal is added to the bus protocol (ACK) according toPACT07 to signal rejects to the CT according to FILMO protocol.

[0306] 10. Combining Individual Registers

[0307] Independent configuration registers are used for a logicalseparation of configuration data. The logical separation is necessary inparticular for the differential configuration because logicallyseparated configuration data is not usually known in carrying out adifferential configuration. Therefore, this results in a large number ofindividual configuration registers, each individual register containinga comparatively small amount of information. In the following example,the 3-bit configuration values KW-A, B, C, D can be written or modifiedindependently of one another: 0000 0000 0000 0 KW-A 0000 0000 0000 0KW-B 0000 0000 0000 0 KW-C 0000 0000 0000 0 KW-D

[0308] Such a register set is inefficient, because only a fraction ofthe bandwdith of the CT bus is used.

[0309] The structure of configuration registers can be greatly optimizedby assigning an enable to each configuration value, indicating whetherthe value is to be overwritten in the current configuration transfer.

[0310] Configuration values KW-A, B, C, D of the above example arecombined in one configuration register. An enable is assigned to eachvalue. For example, if EN-x is logical “0,” the KW-x is not changed inthe instantaneous transfer; if EN-x is logical “1,” KW-x is overwrittenby the instantaneous transfer. En-A KW-A En_B KW-B En-C KW-C En-D KW-D

[0311] 11. Wave Reconfiguration (WRC)

[0312] PACT13 describes a reconfiguration method (wavereconfiguration=WRC) in which reconfiguration is synchronized directlyand chronologically with the data stream (FIG. 24 in PACT13).

[0313] It is essential for the function of wave reconfiguration thatunconfigured PAEs can neither accept nor send data or triggers. Thismeans that an unconfigured PAE behaves completely neutrally. This can beachieved in VPU technology by using handshake signals (RDY/ACK) fortrigger buses and data buses (see PACT02). An unconfigured PAE thengenerates

[0314] no RDYs, so no data or triggers are sent,

[0315] no ACKs, so no data or triggers are received.

[0316] This mode of functioning is not only helpful for wavereconfiguration, but it is also one of the possible bases for run timereconfigurbility of VPU technology.

[0317] An expansion of this method is explained below. Reconfigurationis synchronized with ongoing data processing. Within data processing inthe PA, it is possible to decide

[0318] I. which next SubConf becomes necessary in the reconfiguration;

[0319] ii. at what time the SubConf must become active, i.e., with whichdata packet (ChgPkt) the SubConf must be linked.

[0320] The decision as to which configuration is loaded is made based onconditions and is represented by triggers (wave configurationpreload=WCP).

[0321] Linking of the data packets to the KWs of a SubConf is ensured bythe data bus protocol (RDY/ACK) and the CT bus protocol (CHECK,ACK/REJ). An additional signal (wave configuration trigger=WCT)indicates in which data packet (ChgPkt) reconfiguration is to beperformed and optionally which new configuration is to be carried out orloaded. WCT can be implemented through simple additional lines or thetrigger system of the VPU technology. Multiple VPUs may be usedsimultaneously in the PA, and each signal can control a differentreconfiguration.

[0322] 11.1. Controlling the Wave Reconfiguration

[0323] A distinction can be made between two application-dependent WRCs:

[0324] A1) wave reconfiguration within one SubConf,

[0325] A2) wave reconfiguration of different SubConfs.

[0326] In terms of the hardware, a distinction is preferably madebetween two basic types of implementation:

[0327] I1) implementation in the CT and execution on request

[0328] I2) implementation through additional configuration registers(WRCReg) in the PAEs. Possible embodiments of the WRCRegs are describedbelow. The WRCs are either

[0329] a) preloadded by the CT at the time of the first configuration ofthe respective SubConf, or

[0330] b) preloaded by the CT during execution of a SubConf depending onincoming WCPs.

[0331] During data processing, the WRCRegs valid at that time areselected by one or more WCTs.

[0332] The effects of wave reconfiguration on the FILMO principle arediscussed below.

[0333] 11.1.1. PerforminQ WRC According to A1

[0334] Reconfiguration by WRC is possible at any time within one and thesame SubConf (A1). First, the SubConf is configured normally, so theFILMO principle is ensured. It is a requirement that during programexecution, WRCs must use only resources already allocated for theSubConf.

[0335] Case I1)

[0336] WRC is performed by differential configuration of the respectivePAES. WCP is sent to the CT. Depending on the WCP, there is a jump to atoken within the configured SubConf: begin SubConf main: PAE 1, CHECK&GOPAE 2, CHECK&GO . . . PAE n, CHECK&GO set TriggerPort 1 // WCT 1 setTriggerPort 2 // WCT 2 scheduler: on TriggerPort 1, do main1 // jumpdepending on WCT on TriggerPort 2, do main2 // jump depending on WCTwait: wait for trigger main1: PAE 1, DIFFERENTIAL&GO PAE 2,DIFFERENTIAL&GO . . . PAE n, DIFFERENTIAL&GO wait for trigger main2: PAE1, DIFFERENTIAL&GO PAE 2, DIFFERENTIAL&GO . . . PAE n, DIFFERENTIAL&GOwait for trigger end SubConf

[0337] The interface (TrgIO) between CT and WCP is configured by “setTriggerport.” According to the FILMO protocol, TrgIO behaves like a PAEwith respect to the CT, i.e., TrgIO corresponds exactly to the CHECK,DIFFERENTIAL, GO protocol and responds with ACK or REJ for each triggerindividually or for the group as a whole.

[0338] Depending on whether a certain trigger

[0339] has already been configured, it responds with REJ or

[0340] is ready for configuration, it responds with ACK.

[0341]FIG. 8 from PACT10 is to be extended accordingly by including thisprotocol.

[0342] Upon receipt of WCT, the respective PAE starts the correspondingconfiguration.

[0343] Case I2)

[0344] Either

[0345] the WRCRegs are already being written during the

[0346] configuration and WCP is omitted because the complete SubConf hasalready been loaded into the respective PAE or

[0347] depending on certain WCPs, certain WRCs are loaded by the CT intodifferent WRCRegs defined in the WRC. This is necessary in particularwhen, stating from one SubConf. it branches off into more different WRCsdue to WRTs than are present as physical WRCRegs.

[0348] The trigger ports within the PAEs are configured so that certainWRCRegs are selected due to certain incoming WRTs: begin SubConf main:PAE1_TriggerPort 1 PAE1_TriggerPort 2 PAE1_WRCReg1 PAE1_WRCReg2PAE1_BASE, CHECK&GO . . . PAE2_TriggerPort 1 PAE2_TriggerPort 2PAE2_WRCReg1 PAE2_WRCReg2 PAE2_BASE, CHECK&GO . . . PAEn_TriggerPort 1PAEn_TriggerPort 2 PAEn_WRCReg1 PAEn_WRCReg2 PAEn_BASE, CHECK&GOendSubConf

[0349] 11.1.2. Performing WRC According to A2

[0350] Case I1)

[0351] The CT performing a WRC between different SubConfs corresponds inprinciple to A1/I1. It is essential that the trigger ports and theCT-internal sequencing correspond to the FILMO principle. KWs rejectedby the PAEs (REJ) are written to FILMO. These principles are alreadyknown from PACT10.

[0352] All WCPs are executed by the CT, thus guaranteeing adeadlock-free (re)configuration. Likewise, the time of reconfiguration,which is marked by WCT, is sent to the CT and is handled by the CT on anatomary basis, i.e., all PAEs affected by the reconfiguration receivethe reconfiguration request through WCT either simultaneously or atleast without interruption by another reconfiguration request, whichagain guarantees freedom from deadlock.

[0353] More extensive embodiments are described above within the presentdocument.

[0354] Case I2)

[0355] Either

[0356] the WRCRegs are already written during the configuration and WCPis omitted because the complete SubConf is already loaded into therespective PAE or

[0357] depending on certain WCPs, WRCs determined by the CT are loadedinto different WRCRegs defined in the WRC. This is especially necessarywhen, starting from a SubConf, branching off into more different WRCsdue to WRTs than there are physical WRCRegs.

[0358] Several WCTs being sent to different PAEs at different times mustbe prevented because this results in deadlock. For example: WCT1 of aSubConf SA reaches PAE p1 in cycle t1, and WCT2 of a SubConf SB reachesPAE p2 at the same time. The PAEs are configured accordingly. At timet2, WCT1 reaches p2 and WCT2 reaches p1. A deadlock has occurred. Itshould also be pointed out that this example can also be applied inprinciple to A2-I1, which is why WCT there is sent through the triggerport of the CT and is handled by the CT.

[0359] A deadlock is also prevented by the fact that the WCTs generatedby different PAEs (sources) are prioritized by a central instance (ARB)so that exactly one WCT is sent to the respective PAEs in one cycle.Various methods of prioritization can be used, with a few mentionedbelow as examples:

[0360] a) Using an arbiter according to the related art; the round robinarbiter according to PACT10 is especially suitable. The exactchronological order in occurrence of WCTs is usually lost here.

[0361] b) If chronological order is to be preserved, the followingmethods are suggested, for example:

[0362] b1) A FIFO first stores the incoming WCTs in order of receipt.WCTs received simultaneously are stored together, and if no WCT occursat a given time, no entry is generated. An arbiter downstream from FIFOselects one of the entries if there have been several at the same time.

[0363] b2) A method is known from PACT18 which permits time sorting ofevents on the basis of an associated time information (time stamp). Thecorrect chronological order of WCTs can be ensured by analyzing thistime stamp.

[0364] Suitable relaying of WCTs from ARB to the respective PAEs ensuresthat prioritized WCTs received by the PAEs in the correct order. Thesimplest possibility of ensuring this order is for all triggers goingfrom ARB to the respective PAEs to have exactly the same length andtransit time. This is ensured by suitable programming or by a suitablelayout through a router, by adjusting the wiring accordingly and byusing registers to compensate for latency at the corresponding points.To ensure correct relaying, the method known from PACT18 may also beused for time synchronization of information.

[0365] No explicit prioritization of WCPs is necessary because the WCPssent to the CT are processed properly by the FILMO principle within theCT as explained above. It is optionally possible to ensure that the timesequence is maintained in particular according to the FILMO principle(see 2.1e).

[0366] 11.1.3. Note for all cases

[0367] The additional configuration registers of the PAEs for wavereconfiguration behave according to the FILMO principle, i.e., theregisters support the states described and the sequences implemented andrespond to the protocols accordingly (e.g., CHECK, ACK/REJ).

[0368] 11.2. Reconfiguration Protocols and Structure of WRCReg

[0369] The wave reconfiguration method will now be explained in greaterdetail on the basis of three alternative reconfiguration protocols.

[0370] Normal CT protocol: The CT reconfigures each PAE individuallyonly after receipt of a reconfiguration request, the CT receiving areconfiguration request for each PAE reached by ChgPkt. This method isnot efficient because it entails a very high communication complexityfor pipelined bus systems in particular.

[0371] Synchronized pipeline: This protocol is much more efficient. Thepipelined CT bus is used as a buffer in that the pipeline registerassigned to a PAE stores the KWs of this PAE until this PAE can receivethe KWs. Although the CT bus pipeline (CBP) is blocked, it can be filledcompletely with the KWs of the wave reconfiguration.

[0372] a) If the CBP runs in the same direction as the data pipeline, afew cycles of latency time are lost until a KW of the PAE which followsdirectly is received by its pipeline register after a PAE has received aKW.

[0373] b) If the CBP runs opposite the data pipeline, the CBP can befilled completely with KWs which are already available at the specificPAEs. Thus, wave reconfiguration without any time lag is possible.

[0374] Synchronized shadow register (most efficient protocol):Immediately after selection of the SubConf (I) and before receipt ofChgPkt (ii), the CT writes new KWs into the shadow registers of allPAEs. The shadow registers may be implemented in any embodiment. Thefollowing possibilities are suggested in particular: a) a register stageconnected upstream from the actual configuration register, b) a parallelregister set which is selected by multiplexers, c) a FIFO stage upstreamfrom the actual configuration registers. At the time when ChgPkt (ii) isreceived by a PAE, it copies the shadow register into the correspondingconfiguration register. In the optimum case, this copying takes place insuch a way that no working cycle is lost. If no writing into the shadowregister takes place (i.e., it is empty) despite the receipt of ChgPkt,data processing stops until the KW is received by the shadow register.If necessary, the reconfiguration request is relayed together withChgPkt from one PAE to the next within a pipeline.

[0375] 12. Forms of Parallelism and Sequential Processing

[0376] Due to a sufficiently high reconfiguration performance,sequential computational models can be mapped in arrays in that thelow-level CTs represent a more or less conventional code fetcher, andthe array works with microprogrammable networking as a VLIW-ALU.Furthermore, all forms of parallelism can be mapped in arrays ofcomputing elements:

[0377] Pipelining: Pipelines may be made up of series-connected PAEs.VPU-like protocols allow simple control of the pipeline.

[0378] Instruction level parallelism: Parallel data paths can beconstructed through parallel-connected PAEs. VPU-like protocols andespecially the trigger signals allow a simple control.

[0379] SMP, multitasking and multiuser: Independent tasks can beexecuted automatically in parallel in one PA due to the freedom fromdeadlock of the configuration methods.

[0380] With a sufficient number of PAEs, all the essential parts ofconventional microprocessors can be configured on the PA. Then there canbe sequential processing of a task even without a CT. It need not becomeactive again until the configured processor is to have a differentfunctionality in part, e.g., in the ALU, or is to be replacedcompletely.

[0381] 13. Exemplary Embodiments and Diagrams

[0382]FIGS. 1 through 3 show the structure of a SubConf as an example.CW-PAE indicates the number of a KW within a PAE having the address PAE(e.g., 2-3 is the second KW for the PAE having address 3). In addition,this also shows the flags©=check, D=differential, G=go), a set flagbeing indicated with “*.”

[0383]FIG. 1 shows the simplest linear structure of a SubConf asnormally used according to PACT10. A PAE is tested during the firstconfiguration (C), then is configured further (D) and finally is started(G) (see PAE having address 0). Simultaneous testing and starting arealso possible (CG, see PAE having address 1, 0101).

[0384]FIG. 2 shows a SubConf which has been re-sorted so that a barrier(0201) has been introduced. All PAEs must be tested before the barrier.The barrier then waits until receipt of all ACKs or REJS. If no REJoccurs, the barrier is skipped, the differential configurations areperformed and the PAEs are started. If a REJ occurs, the barrier is notskipped, and instead FILMO runs are executed until no more REJ occursand then the barrier is skipped. Before the barrier, each PAE must betested, and only thereafter can the PAEs be configured differentiallyand started. If testing and starting originally took place in the samecycle, the KW must now be separated (0101 0202→0203).

[0385] In FIG. 3, a SubConf is re-sorted so that no barrier isnecessary, and instead a latency period during which no further checkcan be performed is inserted between check and receipt of ACK/REJ. Thisis done by combining the KWs into atoms (0301), the first KW of an atomperforming a check (0302), and the block then being filled withdifferential KWs or optionally NOPs (0303) until the end of the latencyperiod. The number of differential KWs depends on the latency period.For reasons of illustration, a latency period of three cycles has beenselected as an example. ACK/REJ is received at 0304, whereupon adecision is made as to whether configuration is to be continued with thenext KW, which may (but need not necessarily) contain a check (0305), orit is terminated on the basis of a REJ to preserve the order.

[0386] It is important to first perform a check in configuring a PAE X,then to wait for receipt of ACK/REJ; a PAE that has already been checkedcan be configured further during this period of time, or NOPs must beintroduced. only then can PAE X be configured further. Example: Check ofPAE (0302), continuation of configuration (0306). At 0307, NOPs must beintroduced after a check because no differential configurations areavailable. 0308 shows the splitting of configurations over multipleblocks (three in this case), with one check being omitted (0309).

[0387]FIG. 4 shows a possible state machine for implementation of PAEstates. The initial status is IDLE (0401). By configuring the check flag(0405). the state machine goes into the “allocated” state (0402).Configuring the LAST flag (0409, 0408) starts the PAE; the status is“configured” (0404). By local reset (0407) the PAE goes into the“unconfigured” state (0403). In this embodiment, the PAE returns to IDLEonly after a query about its status by LOCK/FREE (0406).

[0388] Local reset and LAST can also be sent by the CT through abroadcast (see moduleID).

[0389]FIGS. 5 through 9 show possible implementations of FILMO methodsaccording to section 5, showing only the relevant subassemblies whichfunction as an interface with the PA. Interfaces with the CT are notdescribed here. These can be implemented largely according to therelated art (PACT10) and require only minor modifications, if any.

[0390]FIG. 5 shows the structure of a CT interface to the PA when usinga STATELUT according to 5.1. A CT 0501 having RAM and integrated FILMO(0502) is shown in abstracted form and is not explained in greaterdetail here because its function is sufficiently well known from PACT10and PACT05 according to the related art. The CT inquires as to thestatus of the PA (0503) by setting the LOCK signal (0504), whereuponeach PAE whose status has changed since the last LOCK relays (0506) thischange to the STATELUT (0505). This relaying takes place in such amanner that the STATELUT can allocate its status uniquely to each PAE.Several methods according to the related art are available for thispurpose; for example, each PAE can send its address together with thestatus to the STATELUT, which then stores the status of each PAE underits address.

[0391] The CT writes KWs (0510) first into a register (0507). At thesame time, a lookup is performed under the address (#) of the PAEpertaining to the respective KW in the STATELUT (0505). If the status ofthe PAE is “not configured,” the CT receives an ACK (0509), otherwise aREJ. A simple protocol converter (0508) converts an ACK into a RDY inorder to write the KW to the PA, and REJ is converted to notRDY toprevent writing to the PA.

[0392] It should also be pointed out that relaying LOCK, RDY and KW tothe PA and in the PA, like the acknowledgment of the status of the PAEsby the PA, is pipelined, i.e., it is run through registers.

[0393]FIG. 6 illustrates the method according to 5.2 as an example. Thishas a relatively low level of complexity. A CT (0601) having integratedFILMO (0602) is modified so that an acknowledgment (0605) (ACK/REJ) isexpected only for the first KW (0604) of an atom sent to the PA (0603).The acknowledgment is analyzed for the last KW of an atom. In the caseof ACK, the configuration is continued with the next atom, and REJcauses termination of configuration of the SubConf.

[0394]FIG. 7 shows the structure of a FILMO (0701) according to 5.3. TheRELJMP memory (0702) is assigned to FILMO, each entry in RELJMP beingassigned to a FILMO entry. FILMO here is designed as an integrated FILMOaccording to PACT10, so that RELJMP represents a concatenated list ofKWs to be configured. In addition, it should also be pointed out inparticular that FILMO in this case may contain CT commands andconcatenation according to PACT10. The concatenated list in RELJMP isgenerated as follows: The read pointer (0703) points to the KW which isbeing configured at the moment. The address of the KW rejected (REJ)most recently is stored in 0704. If the KW (0706) being configured atthe moment is accepted by the PA (0707) (ACK, 0708), then the valuestored in 0702 at the address to which 0703 points is added to 0703.This results in a relative jump.

[0395] If, however, the KW being configured at the moment is rejected(REJ, 0708), then first the difference between 0703 and 0704, which iscalculated by a subtractor (0705), is stored in RelJmp, namely at theaddress of the KW rejected last and stored in 0704. The current value of0703 is stored in 0704. Then the value stored in 0702 at the address towhich 0703 points is added to 0703. This yields a relative jump. Controlis assumed by a state machine (0709) which is implemented according tothe sequence described here. The address for RelJmp is determined by0709 by way of a multiplexer (0710), depending on the operation, byselecting between 0703 and 0704. In order to be able to address 0701 and0702 efficiently and differently at the same time, it is advisable tophysically separate 0702 from 0701 in the implementation in order toobtain two separate memories which can be addressed separately.

[0396]0711 shows the functioning of the relative addressing. The addresspointing at an entry in RelJmp [is] added to the content of RelJmp,yielding the address of the next entry.

[0397]FIG. 8 shows a possible implementation of the method according to5.3 with a modified garbage collector. Entries in FILMO (0801) aremanaged linearly, so RelJmp is not needed. 0801 is implemented as aseparate FILMO. KWs (0803) written into the PA (0802) are addressed by aread pointer (0804). All KWs are written in the order of theirconfiguration into a FIFO or a FIFO-like memory (0805), which may alsobe implemented as a shift register, for example. The depth of the memoryis exactly equal to the number of cycles elapsing from sending a KW tothe PA until receipt of the acknowledgment (RDY/ACK, 0806).

[0398] Upon receipt of a REJ, the rejected KW, which is assigned to theREJ and is at the output of the FIFO at this time, is written into 0801.REJ is used here as a write signal for FILMO (REJ->WR). The writeaddress is generated by a write pointer (0807), which is incrementedafter the write access.

[0399] Upon receipt of an ACK, nothing happens, the configured KWassigned to the ACK is ignored and 0807 remains unchanged.

[0400] This results in a new linear sequence of rejected KWs in theFILMO. Implementation of FILMO as a dual-ported RAM with separate readand write ports is proposed for performance reasons.

[0401]FIG. 9 shows an implementation of the method according to 5.4. Itcan be seen that this is a modification of the method according to 5.3

[0402] The KW (0902) to be written into the PA (0901) is addressed by aread pointer (0909) in FILMO (0910). The address and flags (0902 a) ofthe PAE of 0902 to be configured are sent first to the PA as a test. TheKW having the address of the PAE to be configured is written to aFIFO-like memory (0903, corresponding to 0805). 0902 a is transmitted tothe PA in a pipeline. Access is analyzed and acknowledged in the PAEaddressed. Acknowledgment (RDY/ACK) is also sent back pipelined (0904).0903 delays exactly for as many cycles as have elapsed from sending 0902a to the PA until receipt of the acknowledgment (RDY/ACK, 0904).

[0403] If acknowledged with ACK, the complete KW (0905) (address+data)at the output of 0903 which is assigned to the respective acknowledgmentis pipelined to the PA (0906). No acknowledgment is expected for this,because it is already known that the addressed PAE will accept the KW.

[0404] In the case of REJ, the KW is written back into the FILMO (0907).A write pointer (0708) which corresponds to 0807 is used for thispurpose. The pointer is incremented in this process.

[0405]0904 is converted here by a simple protocol converter (0911) (I)into a write signal for the PA (RDY) in the case of ACK and (ii) into awrite signal 0901 for the FILMO (WR) in the case of REJ.

[0406] The result is a new linear sequence of rejected KWs in the FILMO.Implementation of FILMO as a dual-ported RAM with separate read andwrite ports is proposed for performance reasons.

[0407]FIG. 10 shows an embodiment of the inter-CT protocol according tothe present invention, showing four levels of CT: the root CT (1001),CTs of two intermediate levels (1002 a-b and 1003 a-d), the low-levelCTs (1004 a-h) and their FILMOs (1005 a-h). In the PA assigned to 1004e, a trigger is generated which cannot be translated to any localSubConf within 1004 e but instead is assigned to an EnhSubConf. 1004 esends a REQUEST for this EnhSubConf to 1003 c. 1003 c has not cached theID chain. EnhSubConf is partially also carried out on 1004 g, which isnot a leaf of 1003 c. Thus, 1003 c relays the REQUEST to 1002 b. Thehatching indicates that 1002 b might have cached the ID chain because1004 g is a leaf of 1002 b. However, 1002 b has neither accepted norcached the ID chain and therefore requests it from 1001. 1001 loads theID chain from the CTR (see PACT10) and sends it to 1002 b. This processis referred to below as GRANT. 1002 b has cached the ID chain becauseall participating CTs are leaves of 1002 b. Then 1002 b sends GRANT to1003 c and 1003 d as an atom, i.e., without interruption by anotherGRANT. The ID chain is split here and sent to two different CTs, so noneof the receivers may be a common arbiter of all leaves. The SPLIT flagis set; the receivers and all lower-level CTs can no longer cache the IDchain. 1003 c and 1003 d again send GRANT to low-level CTs 1004 f and1004 g as an atom. The low-level CTs store the incoming GRANT directlyin a suitable list, indicating the order of SubConf to be configured;this list may be designed to be separate, for example, or it may beformed by performing the configuration directly by optionally enteringthe rejected KWs into FILMO.

[0408] There are two variants for the low-level CTs:

[0409] Either they have already cached the SubConf to be started,corresponding to the ID according to the ID chain, in which case theconfiguration is started immediately,

[0410] or they have not yet cached the SubConf corresponding to the IDaccording to the ID chain, in which case they must request it first fromthe higher-level CTs. The request (GET) is illustrated in FIG. 11, whereit is again assumed that none of the CTs from the intermediate level hascached the SubConf. Therefore, the respective SubConf is loaded by theroot CT from the CTR and sent to the low-level CTs (DOWNLOAD). Thissequence is known in principle from PACT10 and therefore need not beexplained in greater detail here.

[0411] In any case, however, it is essential that after receipt of aGRANT, it is executed before any other GRANT. If GRANT A is receivedbefore GRANT B, then GRANT A must be configured before GRANT B. This isalso true if the SubConf of GRANT A must first be loaded while theSubConf of GRANT B would be cached in the low-level CT and could bestarted immediately. The order of incoming GRANTs must be maintained,because otherwise a deadlock can occur among the EnhSubConf.

[0412] In a special embodiment of the method described here, CTs of theCTTREE can directly access configurations without including thehigher-level CTs in that they have a connection to any type of volatilememory, nonvolatile memory or bulk memory. For example, this memory maybe an SRAM, DRAM, ROM, flash, CDROM, hard drive or server system,optionally connected via a network (WAN, LAN, Internet). It should bepointed out explicitly that it is also possible for a CT to directlyaccess a memory for configuration data, bypassing the higher-level CTs.In such a case, the configuration is synchronized within the CTTREE,particularly in the case of EnhSubConf, including higher-level CTs.

[0413]FIG. 12 shows three examples (FIGS. 12a-12 c), illustrating aconfiguration stack of 8 CTs (1201-1208). The configuration stack is thelist of SubConfs to be configured. The SubConfs are configured in thesame order as they are entered in the list. For example, a configurationstack is formed by concatenation of individual SubConfs as described inPACT10 (FIGS. 26 through 28). Another possibility is a simple list ofIDs pointing to SubConfs, as in FIG. 12. Lower-level entries areconfigured first, and higher-level entries are configured last. FIG. 12aillustrates two EnhSubConfs (1210, 1211) which are positioned correctlywithin the configuration stack of the individual CTs. The individualSubConfs of the EnhSubConfs are configured in the proper order without adeadlock. The order of GRANTs was preserved.

[0414] The example in FIG. 12b is also correct. Three EnhSubConf areshown (1220, 1221, 1222). 1220 is a large EnhSubConf affecting all CTs.1221 pertains only to CTs 1202-1206, and another pertains only to CTs1207 and 1208. All SubConfs are configured in the proper order without adeadlock. The GRANT for 1222 was processed completely before the GRANTfor 1220, and the latter was processed before the GRANT for 1221.

[0415] The example in FIG. 12c illustrates several deadlock situations.In 1208, the order of GRANTs from 1230 and 1232 has been reversed,resulting in resources for 1230 being occupied in the PA allocated to1208 and resources for 1232 being occupied in the PA allocated to 1208.These resources are always allocated in a fixed manner. This results ina deadlock, because no EnhSubConf can be executed or configured to theend.

[0416] Likewise, GRANTs of 1230 and 1231 are also chronologicallyreversed in CTs 1204 and 1205. This also results in a deadlock for thesame reasons.

[0417]FIG. 13a shows a performance-optimized version of inter-CTcommunication, a download being performed directly to the low-level CT,i.e., mid-level CTs need not first receive, store and then relay theSubConfs. Instead, these CTs merely “listen” (1301 1302, 1303, LISTENER)and cache the SubConfs. The schematic bus design is illustrated in FIG.13b. A bypass (1304, 1305, 1306), normally in the form of a register,carries the download past the mid-level CTs.

[0418]FIG. 14 shows a possible design of a circuit between CT and PA fora simple configuration of SubConf macros. A KW is transmitted by the CTover the bus (1401). The KW ios broken down into its configuration data(1402) plus PAE addresses X (1403) and Y (1404) (in the case ofmultidimensional addressing, more addresses would be broken downaccordingly). 1405 adds an X offset to the X address, and 1406 adds a Yoffset to the Y address. The offsets may be different and are stored ina register (1407). The parameterizable part of the data (1408) is sentas an address to a lookup table (1409) where the actual values arestored. The values are linked (1410) [to] the nonparameterizable data(1412). By way of a multiplexer (1413), it is possible to select whethera lookup is to be performed or whether the data should be used directlywithout lookup. The choice is made using a bit (1411). All addresses andthe data are linked again and sent as a bus (1413) to the PA. Dependingon implementation, the FILMO is connected upstream or downstream fromthe circuit described here. Integrated FILMOs are connected upstream,and separate FILMOs are connected downstream. The CT sets the addressoffsets and the parameter translation in 1409 via bus 1415. For the sakeof simplicity, 1409 may be designed as a dual-ported RAM.

[0419] The structure of a corresponding KW is as follows: X address Yaddress Data Address for MUX = 1 1409 X address Y address Data Data MUX= 0

[0420] If MUX=1, then a lookup is performed in 1409; at MUX=0, data isrelayed directly to 1414.

[0421]FIG. 15 illustrates the execution of a graph. The next possiblenodes (1 . . . 13) of the graph are preloaded (prefetch), and precedingnodes and unused jumps are deleted accordingly (delete). Within a loop,the nodes of the loop are not deleted (10, 11, .12), and correspondingnodes are removed only after termination. Nodes are loaded only if theyare not already present in the memory of the CT. Therefore, multipleprocessing of 11 (for example) does not result in multiple loading of 12or 10; “delete 8, 9” is ignored in 11 if 8 and/or 9 has already beenremoved.

[0422]FIG. 16 illustrates schematically multiple instantiation of aSubConf macro (1601). Various SubConfs (1602, 1603, 1604) call up 1601.Parameters for 1601 are preloaded (1610) in a lookup table (1605) by therequesting SubConf. 1605 is implemented only once but is shown severaltimes in FIG. 16 to represent the various contents.

[0423]1601 is called up. The KWs are transmitted to 1605, 1606 and 1607.These elements operate as follows: Based on a lookup, the correspondingcontent of 1605 is linked again (1606) to the KWs. The KW is sent to thePA (1608) after the multiplexer 1413 (1607), which selects whether theoriginal KW is valid or whether a lookup has been performed.

[0424]FIG. 17 shows schematically the sequence of a wavereconfiguration. Areas shown with simple hatching representdata-processing PAEs, with 1701 representing PAEs after reconfigurationand 1703 representing PAEs before reconfiguration. Areas shown withcrosshatching (1702) indicate PAEs which are in the process of beingreconfigured or are waiting for reconfiguration.

[0425]FIG. 17a shows the influence of wave reconfiguration on a simplesequential algorithm. It is possible here to reconfigure exactly thosePAEs to which a new function has been allocated. Since a PAE can receivea new function in each cycle, this may be performed efficiently, namelysimultaneously.

[0426] A row of PAEs from the matrix of all PAEs of a VPU is shown as anexample. The states in the cycles after cycle t are shown with a delayof one cycle each.

[0427]FIG. 17b shows the time effect of reconfiguration of largeportions. A number of PAEs of one VPU is shown as an example, indicatingthe states in the cycles after cycle t with a different delay of severalcycles each.

[0428] Although at first only a small portion of the PAEs isreconfigured or is waiting for reconfiguration, this area becomes largerover time, until all the PAEs have been reconfigured. The increase insize of this area means that due to the time delay in reconfiguration,more and more PAEs are waiting for reconfiguration (1702). This resultsin lost computing performance.

[0429] Therefore, it is proposed that a broader bus system be usedbetween the CT (in particular the memory of the CT) and the PAEs,providing enough lines to reconfigure several PAEs at the same timewithin one cycle. Not Wave configured trigger W C D X — X X — Wavereconfiguration X — X — X REJ — X X X — REJ — X X — X Differential wavereconfiguration — Normal configuration

[0430]FIG. 18 shows different configuration strategies for areconfiguration method similar to the “synchronized shadow register”according to 11.2 as an example. The CT (1801), as well as one ofseveral PAEs (1804), are shown schematically with only the configurationregisters (1802, 1803) within the PAE being illustrated, and a unit forselecting the active configuration (1805). Additional function unitswithin the PAE have not been shown here for reasons of simplicity. EachCT has n SubConfs (1820), the corresponding KWs of a SubConf beingloaded when a WCP occurs (1(n)), in cases −I1, and in the cases −I2, theKWs of m SubConfs from the total number of n are loaded (m(n)). Thedifferent tie-ins of WCT (1806) and WCP (1807) are shown, as are theoptional WCPs (1808), as described below.

[0431] In A1-I1, a next configuration is selected within the sameSubConfs by a first trigger WCP; this configuration uses the sameresources or whose resources are at least already prereserved and arenot occupied by any other SubConfs except for that optionally generatingthe WCP. The configuration is loaded by the CT (1801). In the exampleshown here, the configuration is not executed directly, but instead isloaded into one of several alternative registers (1802). By a secondtrigger WCT, one of the alternative registers is selected exactly at thetime of the required reconfiguration, so that the configurationpreviously loaded on the basis of WCP is executed.

[0432] Essentially, a certain configuration is determined and preloadedby WCP, and the time of the actual change in function corresponding tothe preloaded reconfiguration is determined by WCT.

[0433] WCP and WCT may each be a vector, so that one of severalconfigurations is preloaded by WCT(v₁), the configuration to bepreloaded being specified by the source of WCP. Accordingly, WCT(v₂)selects one of several preloaded configurations. In this case, a numberof 1802 corresponding to the quantity of configurations selectable by v2is necessary, the number normally being fixedly predetermined so that v2corresponds to the maximum number.

[0434] On this basis, a version having a register set 1803 with aplurality of 1802 is shown in A1-I2. In the optimal case, the number ofregisters in 1803 is so great that all possible following configurationscan be preloaded directly, so that WCP can be eliminated and only thetime of the change of function as well as the change itself arespecified by WCT(v₂).

[0435] A2-I1 shows the WRC in such a way that a next configuration thatdoes not utilize the same resources or whose resources are notprereserved or are occupied by another SubConf in addition to thatoptionally generating the WCP(v₁). The freedom from deadlock of theconfiguration is guaranteed by the FILMO-compliant response and theconfiguration on WCP(v₁). The CT also starts configurations by WCT(v₂)(1806) through FILMO-compliant atomary response to the receipt oftriggers (ReconfReq) characterizing a reconfiguration time.

[0436] In A2-I2, all the following SubConfs are either preloaded intoconfiguration register 1803 with the first loading of a SubConf or, ifnecessary, if the number of configuration registers is not sufficient,they are re-loaded by the CT by way of running a WCP(v₁) which occurs ina known manner.

[0437] The triggers (ReconfReq, 1809) which determine a reconfigurationtime and trigger the actual reconfiguration are first isolated in timeby way of a suitable prioritizer (1810) and sent as WCT(v₂) to the PAEsso that exactly only one WCT(v₂) is always active on one PAE at a time,and the order of incoming WCT(v₂)s is always the same with all the PAEsinvolved.

[0438] In the case of A2-I1 and A2-I2, an additional trigger system isused in these embodiments. In processing of WCT by 1801, i.e., inprocessing by 1810, there may be a considerable delay until relaying to1804. However, it is essential that the timing of ChgPkt is rigorouslyobserved because otherwise the PAEs would process the following dataincorrectly Therefore, another trigger (1811, WCS=wave configurationstop) is used, which only stops data processing of PAEs until the newconfiguration has been activated by arrival of the WCT. WCS is usuallygenerated within the SubConf active at that time. In many cases,ReconfReq and WCS may be identical, because if ReconfReq is generatedwithin the SubConf currently active, this signal usually also indicatesthat ChgPkt has been reached.

[0439]FIG. 19 shows a variant implementation of A1-I2 and A2-I2, where aFIFO memory (1901) is used to manage the KW instead of using a registerset. The order of SubConfs preselected by WCP is fixed. Due to theoccurrence of WCT (or WCS, alternatively represented by 1902), only thenext configuration can be loaded from FIFO. The essential function ofWCS, i.e., stopping ongoing data processing, is exactly the same as thatdescribed in conjunction with FIG. 18.

[0440]FIG. 20 shows a section of a row of PAEs to illustrate as anexample a reconfiguration method like the “synchronized pipeline”according to 11.2. One CT (2001) is allocated to multiple CT interfacesubassemblies (2004) of PAEs (2005). 2004 is integrated into 2005 and isshown with an offset only to better illustrate the function of WAIT andWCT. Signals for transmission of configuration data from 2004 to 2005are not shown here because of the abstraction.

[0441] The CT is linked to 2004 by a pipelined bus system, 2002representing the pipeline stages. 2002 is composed of a register (2003b) for the configuration data (CW) and another register (2003 a) havingan integrated decoder and logic. 2003 a decodes the address transmittedin CW and sends a RDY signal to 2004 if the respective local PAE isaddressed or sends a RDY signal to the next step 2002, if the local PAEis not addressed. Accordingly, 2003 a receives the acknowledgment (GNT),executed as RDY/ACK according to the known protocol, either from 2002 or2004.

[0442] This results in a pipelined bus which transmits the CW from theCT to the addressed PAE and its acknowledgment back to the CT.

[0443] When WCT is active at 2004, pending CWs which are characterizedwith WAVE as part of the description are configured in 2004, and GNT isacknowledged with ACK.

[0444] If WCT is not active but CWs are pending for configuration, thenGNT is not acknowledged, i.e., the pipeline is blocked until theconfiguration has been performed.

[0445] If 2005 is expecting a wave reconfiguration, characterized by anactive WCT, and no CWs characterized with WAVE are already present at2004, then 2004 will acknowledge with WAIT to put the PAE (2005) in awaiting, non-data-processing status until CWs characterized with WAVEhave been configured in 2004.

[0446] CWs that have not been transmitted with WAVE are rejected withREJ during data processing.

[0447] It should be pointed out that this figure is intended toillustrate only the basic principle. Optimization can be performed byspecial embodiments for certain applications. For example, incoming CWscharacterized with WAVE and the associated reconfiguration can be storedtemporarily by a register stage in 2004, preventing blocking of thepipeline if CWs sent by the CT are not accepted immediately by theaddressed 2004. 2010 and 2011 are used to indicate the direction of dataprocessing for further illustration.

[0448] If data processing proceeds in direction 2010, a rapid wavereconfiguration of the PAEs is possible as follows. The CT sends CWscharacterized with WAVE into the pipeline in such a way that first theCWs of the most remote PAE are sent. If CWs cannot be configuredimmediately, the most remote pipeline stage (2002) is blocked. Then theCT sends CWs to the PAE which is then the most remote and so forth,until the data is ultimately sent to the next PAE.

[0449] As soon as ChkPkt [sic; ChgPkt?] runs through the PAEs, the newCWs can be configured in each cycle. This method is also efficient ifChgPkt is running simultaneously with transmission of CWs from the CTthrough the PAEs because the respective CW required for configurationmay also be pending at the respective PAE in each cycle.

[0450] If data processing proceeds in the opposite direction (2011), thepipeline may optionally be configured, from the PAE most remote from theCT to the PAE next to the CT. If ChgPkt does not take placesimultaneously with data transmission of the CWs, the method remainsoptimal because on occurrence of ChgPkt, the CWs can be transmittedimmediately from the pipeline to 2004.

[0451] However, if ChgPkt appears simultaneously with CWs of wavereconfiguration, this results in waiting cycles. For example, PAE B isto be configured on occurrence of ChgPkt in cycle n. CWs are pending andare configured in 2004. In cycle n+1, ChgPkt (and thus WCT) are pendingat PAE C. However, in the best case, CWs of PAE C are thus transmittedonly to 2002 of PAE B in this cycle, because in the preceding cycle,2002 of PAE B was still occupied with its CW. Only in cycle n+2 are theCWs of PAE C in 2002 and can be configured. A waiting cycle has occurredin cycle n+1.

[0452]FIG. 21 shows the most general synchronization strategy for a wavereconfiguration. A first PAE 2101 recognizes the need forreconfiguration on the basis of a status that is occurring. Thisrecognition can take place according to the usual methods, e.g., bycomparison of data or states. Due to this recognition, 2101 sends arequest (2103) which can be accomplished through a trigger, to one ormore PAEs (2102) to be reconfigured, thereby stopping the dataprocessing. In addition, 2101 sends a signal (2105), which may also bethe same signal 2103, to a CT (2104) to request reconfiguration. 2104reconfigures 2102 (2106), and after successful reconfiguration of allPAEs to be reconfigured, it informs 2101 (2107) regarding the end of theprocedure, optionally by way of reconfiguration. Then 2001 takes backstop request 2103, and data processing is continued. 2108 and 2109 eachsymbolize data and trigger inputs and outputs.

[0453]FIG. 22 illustrates one possibility of ensuring a correctly timedrelaying of WCT through routing measures. Several WCTs are generated fordifferent PAEs (2201) by a central instance (2203), but they should becoordinated with one another in time. The different distances of 2201 inthe matrix result in different transmit times, i.e., latency times. Thisis achieved in the present example through suitable use of pipelinestages (2202) through the router assigned to the compiler (see PACT13).The resulting latencies are indicated as d1-d5. It can be seen here thatthe same latencies occur in the direction of data flow (2204) in eachstage (column). For example, 2205 would not be necessary, because thedistance of 2206 from 2003 is very small. However, one 2202 each must beinserted for 2207 and 2208 because of the transit time resulting fromthe longer distance, so 2205 is necessary to equalize the transmit time

[0454]FIG. 23 illustrates an application of wave reconfiguration. Thisshould illustrate an important advantage of VPU technology, namelyoptional utilization of PAE resources or reconfiguration time to performa task, yielding an intelligent trade-off between cost and performancethat can be adjusted by the compiler or the programmer.

[0455] A data stream is to be calculated (2301) in an array (2302) ofPAEs (2304-2308). A CT (2303) assigned to the array is responsible forits reconfiguration. 2304 is responsible for recognition of the endstate of data processing which makes reconfiguration necessary. Thisrecognition is signaled to the CT. 2306 marks the beginning and 2309 theend of a branch represented by 2307 a, 2307 b or 2307 ab. PAEs 2308 arenot used. The various triggers are represented by 2309.

[0456] In FIG. 23a, one of two branches 2307 a, 2307 b is selected by2305 and is activated by trigger simultaneously with data received from2306.

[0457] In FIG. 23b, 2307 a ² and 2307 b should not be completelypreconfigured, but instead both possible branches should share resources2307 ab by reconfiguration. 2305 also selects the branch necessary fordata processing. Information is now sent to 2303 and also to 2306 tostop data processing until reconfiguration of 2307 ab has been completedaccording to FIG. 21.

[0458]FIG. 24 represents a possible implementation according to 4.2 of astate machine for sequence control of the PAE. The following states areimplemented:

[0459] Not Configured (2401)

[0460] Allocated (2402)

[0461] Wait for lock (2403)

[0462] Configured (2404)

[0463] The following signals trigger a change of status:

[0464] LOCK/FREE (2404, 2408)

[0465] CHECK (2405, 2407)

[0466] RECONFIG (2406, 2409)

[0467] GO (2410, 2411)

[0468]FIG. 25 shows the structure of a high-level language compiler,which is known from PACT13 and translates ordinary sequential high-levellanguages (C, Pascal, Java) to a VPU system, for example. Sequentialcode (2511) is separated from parallel code (2508) so that 2508 isprocessed directly in the array of PAEs.

[0469] There are three possible embodiments for 2511:

[0470] 1. Within a sequencer of a PAE (see PACT13, 2910)

[0471] 2. By using a sequencer configured into the VPU. The compilergenerates a sequencer optimized for the task, while directly generatingthe algorithm-specific sequencer code (see PACT13, 2801)

[0472] 3. On an ordinary external processor (see PACT13, 3103)

[0473] 4. By rapid configuration by a CT, in that the ratio between thenumber of PAEs within a PAC and the number of PACs is selected so thatone or more PACs can be set up as dedicated sequencers whose op codesand command execution are configured by the respective CT in eachoperating step. To do so, the respective CT responds to the status ofthe sequencer to determine the following program sequence. The status istransmitted by the trigger system. The possibility that is selected willdepend on the architecture of the VPU, the computer system and thealgorithm.

[0474] This principle is known from PACT13. However, reference should bemade explicitly to the extensions of the router and placer (2505)according to the present disclosure.

[0475] The code (2501) is first separated in a preprocessor (2502) intodata flow code (2516) (which was written in a special version of therespective programming language optimized for data flow) and ordinarysequential code (2517). 2517 is tested for parallelizable subalgorithms(2503) and the sequential subalgorithms are sorted out (2518)Paralleliable subalgorithms are placed and routed as macros on aprovisional basis.

[0476] In an iterative process, the macros are placed, routed andpartitioned (2505) together with the data flow-optimized code (2513).Statistics (2506) evaluate the individual macros as well as theirpartitioning with regard to efficiency, with the reconfiguration timeand the complexity of the reconfiguration entering into the efficiencyanalysis. Inefficient macros are removed and sorted out as sequentialcode (2514). The remaining parallel code (2515) is compiled andassembled (2507) together with 2516, and VPU object code is output(2508).

[0477] Statistics regarding the efficiency of the code generated as wellas individual macros (including those removed with 2514) are output(2509), so the programmer thus receives important information regardingoptimization of the speed of the program.

[0478] Each macro of the remaining sequential code is tested for itscomplexity and requirements (2520). The suitable sequencer in each caseis selected from a database, which depends on the VPU architecture andthe computer system (2519), and this sequencer is output as VPU code(2521). A compiler (2521) generates the assembler code of the respectivemacro for the respective sequencer selected by 2520 and outputs it(2511). 2510 and 2520 are closely linked together. Processing mayoptionally proceed iteratively to find the most suitable sequencerhaving the fastest and minimal assembler code.

[0479] A linker (2522) compiles the assembler codes (2508, 2511, 2521)and generates executable object code (2523).

DEFINITION OF TERMS

[0480] ACK/REJ: Acknowledgment protocol of a PAE to a (re)configurationattempt. ACK indicates that the configuration has been accepted, REJindicates that the configuration has been rejected. The protcol providesfor waiting for receipt of either ACK or REJ and optionally insertingwaiting cycles until then.

[0481] CT: Unit for interactive configuration and reconfiguration ofconfigurable elements. A CT has a memory for temporary storage and/orcaching of SubConfs. In special embodiments, CTs that are not root CTsalso have a direct connection to a memory for SubConfs, which in thiscase is not loaded by a higher-level CT.

[0482] CTTREE: One-dimensional or multidimensional tree of CTs.

[0483] EnhSubConf: Configuration containing multiple SubConfs to beexecuted on different PACs.

[0484] Configuration: Complete executable algorithm

[0485] Configurable element: Physically any element whose exact functionwithin the possibilities to be implemented is determined by aconfiguration. In particular, a configurable element may be designed asa logical function unit, arithmetic function unit, memory, peripheralinterface or bus system; this includes in particular elements of knowntechnologies such as FPGA (e.g., CLBs), DPGAs, VPUs and all elementsknown under the term “reconfigurable computing.” A configurable elementmay also be a complex combination of multiple different function units,e.g., an arithmetic unit with an integrated allocated bus system.

[0486] KW: Configuration word. One or more pieces of data intended forthe configuration or part of a configuration of a configurable element.

[0487] Latency: Delay within a data transmission, which usually takesplace in synchronous systems based on cycles, and therefore is given inclock cycles.

[0488] PA: Processing array, arrangement of multiple PAEs, includingthose of different designs.

[0489] PAC: A PA with the respective CT responsible for configurationand reconfiguration of this PA.

[0490] PAE: Processing array element, configurable element.

[0491] ReconfReq: Triggers based on a status which requiresreconfiguration are generated and characterize same.

[0492] Reconfiguration: Loading a new configuration, optionallysimultaneously or overlapping or in parallel with data processing,without interfering with or corrupting the ongoing data processing.

[0493] Root CT: Highest CT in the CTTREE; it has a connection to theconfiguration memory, usually but not necessarily as the only CT.

[0494] SubConf: Part of a configuration composed of multiple KWs.

[0495] WCT: Characterizes the time at which a reconfiguration is to takeplace. May optionally select one of several possible configurations viatransmission of additional information. WCT usually runs in exactsynchronization with the termination of the data processing underway atthat moment, to be terminated for the reconfiguration. If WCT istransmitted later for reasons of implementation, WCS is used forsynchronization of data processing.

[0496] WCP: Requests one or more alternative next configuration(s) fromthe CT for (re)configuration.

[0497] WCS: Stops the data processing until receipt of WCT. Must be usedonly if WCT does not indicate exactly the time of the requiredreconfiguration due to the implementation, but instead is received bythe respective PAE only after the actual end of data processing of theconfiguration to be terminated.

[0498] Cell: Configurable element

[0499] References

[0500] PACT01 4416881

[0501] PACT02 19781412.3

[0502] PACT04 19654842.2-53

[0503] PACT05 19654593.5-53

[0504] PACT07 19880128.9

[0505] PACT08 19880129.7

[0506] PAC10 19980312.9 and 19980309.9

[0507] PACT13 PCT/DE00/01869

[0508] PACT18 10110530.4

What is claimed is:
 1. A method of operating a data processing systemhaving a unidimensional or multidimensional configurable cell structurewherein a flag is sent together with the configuration data, indicatingthat a cell will be reconfigured, in which case the cell then acceptsthe configuration if it has been in an unconfigured state, and otherwiserejects the configuration.
 2. A method of operating a data processingsystem having a unidimensional or multidimensional configurable cellstructure, wherein a flag is sent together with the configuration data,indicating that a cell will be partially configured, in which case thecell then accepts the configuration if it has been in a configuredstate, and otherwise rejects the configuration.
 3. A method of operatinga data processing system having a unidimensional or multidimensionalconfigurable cell structure, wherein a cell is started by a certainflag.
 4. (New) A method of controlling a data processing system having acellular structure, comprising: transmitting a first configuration wordto a first processing unit in the cellular structure; processing datawith the first processing unit in accordance with the firstconfiguration word; transmitting a second configuration word to thefirst processing unit; transmitting a reconfiguration signal to thefirst unit, the reconfiguration signal indicating that the first unitshould begin processing data in accordance with the second configurationword; if the first processing unit has completed processing data inaccordance with the first configuration word prior to when thereconfiguration signal is received by the first processing unit,processing data with the first processing unit in accordance with thesecond configuration word; and if the first processing unit has notcompleted processing data in accordance with the first configurationword, continuing to process data with the first processing unit inaccordance with the first configuration word.
 5. (New) The method ofclaim 4, further comprising: if the first processing unit has completedprocessing data in accordance with the first configuration word prior towhen the reconfiguration signal is received by the first processingunit, sending an acknowledgment signal in response to the receipt of thereconfiguration signal.
 6. (New) The method of claim 4, furthercomprising: if the first processing unit has not completed processingdata in accordance with the first configuration word, sending arejection signal in response to the receipt of the reconfigurationsignal.
 7. (New) The method of claim 4, wherein the reconfigurationsignal is transmitted together with the second configuration word. 8.(New) The method of claim 4, further comprising: storing the secondreconfiguration word in the first processing unit prior to theprocessing of data with the first processing unit in accordance with thefirst configuration word, wherein the reconfiguration signal istransmitted after the processing of data with the first processing unitin accordance with the first configuration word has commenced.
 9. (New)The method of claim 4, further comprising: setting a state variableassociated with the first processing unit to indicate the firstprocessing unit is in a configured state prior to processing data withthe first processing unit in accordance with the first configurationword.
 10. (New) The method of claim 9, further comprising: setting thestate variable associated with the first processing unit to indicate thefirst processing unit is in an unconfigured state when processing ofdata with the first processing unit in accordance with the firstconfiguration word has been completed.
 11. (New) The method of claim 10,further comprising: testing the state variable associated with the firstprocessing unit when the reconfiguration signal is received by the firstprocessing unit.
 12. (New) The method of claim 4, further comprising:transmitting a differential reconfiguration signal to the firstprocessing unit.
 13. (New) The method of claim 12, further comprising:if the first processing unit is processing data in accordance with thefirst configuration word when the differential reconfiguration signal isreceived, terminating the processing of data by the first processingunit in accordance with the first configuration word and commencingprocessing of data by the first processing unit in accordance with thesecond configuration word.
 14. (New) The method according to claim 13,further comprising: sending an acknowledgment signal from the firstprocessing unit if the first processing unit is processing data inaccordance with the first configuration word when the differentialreconfiguration signal is received by the first processing unit. 15.(New) The method according to claim 12, further comprising: sending arejection signal from the first processing unit if the first processingunit is not processing data in accordance a configuration word when thedifferential reconfiguration signal is received by the first processingunit.
 16. (New) The method of claim 4, further comprising: delaying thetransmission of the configuration signal, the delay being sufficient toprevent the second configuration signal from arriving at the firstprocessing unit before the first processing unit has completedprocessing data in accordance with the first configuration word; 17.(New) The method of claim 4, further comprising: routing thetransmission of the reconfiguration signal to the first processing unitvia a first transmission path to the first processing unit that islonger than a shortest transmission path to the first processing unit,the first transmission path being selected so that the firsttransmission path is sufficiently long to ensure that thereconfiguration signal reaches the first processing unit after theprocessing of data by the first processing unit in accordance with thefirst configuration word has been completed.
 18. (New) The methodaccording to claim 4, further comprising: sending a start signal to thefirst processing unit; beginning the processing of data in the firstprocessing unit in accordance with the first configuration word when thestart signal is received by the first processing unit.
 19. (New) Amethod of controlling a data processing system having a cellularstructure, comprising: transmitting a first configuration word to afirst processing unit in the cellular structure; setting a statevariable associated with the first processing unit to a value indicativethat the first processing unit is in a configured state; processing datawith the first processing unit in accordance with the firstconfiguration word; when the processing of data with the firstprocessing unit in accordance with the first configuration word has beencompleted, setting the state variable associated with the firstprocessing unit to a value indicative that the first processing unit isin an unconfigured state; transmitting a second configuration word tothe first processing unit; transmitting a reconfiguration signal to thefirst unit, the reconfiguration signal indicating that the first unitshould begin processing data in accordance with the second configurationword; if the state variable indicates the first processing unit is in anunconfigured state when the reconfiguration signal is received by thefirst processing unit, processing data with the first processing unit inaccordance with the second configuration word; and if the firstprocessing unit is in a configured state when the reconfiguration signalis received by the first processing unit, continuing to process datawith the first processing unit in accordance with the firstconfiguration word.
 20. (New) The method according to claim 19, furthercomprising: if the state variable indicates the first processing unit isin an unconfigured state when the reconfiguration signal is received bythe first processing unit, sending an acknowledgment signal from thefirst processing unit in response to the reconfiguration signal. 21.(New) The method according to claim 19, further comprising: if the firstprocessing unit is in a configured state when the reconfiguration signalis received by the first processing unit, sending a rejection signalfrom the first processing unit in response to the reconfigurationsignal.
 22. (New) The method according to claim 19, further comprising:transmitting a differential reconfiguration signal to the firstprocessing unit; if the first processing unit is in an unconfiguredstate when the differential reconfiguration signal is received by thefirst processing unit, maintaining the first processing unit in anunconfigured state; and if the first processing unit is in a configuredstate, processing data with the first processing unit in accordance withthe second configuration word.
 23. (New) The method according to claim22, further comprising: sending an acknowledgment in response to thedifferential configuration signal, if the first processing unit is in aconfigured state when the differential configuration signal is receivedby the first processing unit.